Closed hila3000 closed 4 years ago
I curious to see what DEBUG logs would tell us, https://github.com/kedacore/keda/tree/master#setting-log-levels
Do you think there is a need to close the client? I don't see it being closed.
@balchua Did you ask me that? not sure if closing the client is something that can be done on configuration, let me know. @zroubalik I've enabled debug now, waiting for a new occurrence in the logs of the errors to see if it prints more information about that.
Sorry its more for @zroubalik 😊
@zroubalik What say you on @balchua comment? We are in production with Keda, and these issues happen often, the only "fix" is to restart Keda-operator pod.. to a point we thought to set low memory limits to have the pod restarted on its own... Please advise.
As for debug, I found an issue in dev of a message that got ingested only after 47 minutes. on 7:00 GMT the message got into the sub, lf-scaledobject-1546020352 is its name. on 7:47 GMT the message got ingested, a.k.a only then keda-operator scaled the pod back up from 0->1.
Relevant Logs- There were more in between, but these are the relevant ones, a.k.a about this specific sub or general errors
{"level":"debug","ts":"2020-09-03T06:54:14.985Z","logger":"scalehandler","msg":"Deployment no change","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"dev-log-forwarding","Deployment.Name":"log-forwarding-1546020352"}
{"level":"debug","ts":"2020-09-03T07:09:20.484Z","logger":"scalehandler","msg":"Deployment no change","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"dev-log-forwarding","Deployment.Name":"log-forwarding-1546020352"}
At this point, 7:09, it should have scaled the pod up, but it says "Deployment no change".
{"level":"error","ts":"2020-09-03T07:12:01.685Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp 199.36.153.8:443: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:12:02.085Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": EOF","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:12:24.284Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
BTW, I'm expecting to see an error like that about the relevant sub, lf-scaledobject-1546020352 -
{"level":"debug","ts":"2020-09-03T07:12:21.684Z","logger":"scalehandler","msg":"Error getting scale decision","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-884715244","ScaledObject.ScaleType":"deployment","Error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout"}
but there isn't.... (this one is example to another sub)
{"level":"error","ts":"2020-09-03T07:12:24.284Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:13:12.483Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": write tcp 172.20.8.176:41702->199.36.153.8:443: write: broken pipe","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:14:11.186Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:14:11.186Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:14:12.485Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:14:43.087Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:17:50.687Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:18:01.884Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:18:22.885Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:18:44.481Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": write tcp 172.20.8.176:57870->199.36.153.8:443: write: broken pipe","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:19:19.682Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
I have 10 more occurrences of -
{"level":"error","ts":"2020-09-03T07:21:55.590Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
and then one of-
{"level":"error","ts":"2020-09-03T07:22:15.284Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": write tcp 172.20.8.176:58202->199.36.153.9:443: write: broken pipe","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
We defined cloud DNS so we can bypass FW when communicating with google APIs. the 199 address is of that googleapi, first time to see it in the log after we defined the FW bypass, but it doesn't get reflected when we do a dig/nslookup command, only via the error we see it seems to be working. or.. not, broken pipe.
and now an unexpected EOF as well, until now we already encountered timeouts-
{"level":"error","ts":"2020-09-03T07:22:37.382Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": unexpected EOF","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
back to timeouts-
{"level":"error","ts":"2020-09-03T07:24:03.181Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:24:03.384Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
{"level":"error","ts":"2020-09-03T07:25:01.087Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:88\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
eventually it says the error I expected to see earlier-
{"level":"debug","ts":"2020-09-03T07:25:01.087Z","logger":"scalehandler","msg":"Error getting scale decision","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Error":"rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout"}
then-
{"level":"debug","ts":"2020-09-03T07:25:01.087Z","logger":"scalehandler","msg":"Deployment no change","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"dev-log-forwarding","Deployment.Name":"log-forwarding-1546020352"}
Errors about timeouts or getting active status stopped being reported, and then eventually the scale started working-
{"level":"debug","ts":"2020-09-03T07:46:31.683Z","logger":"scalehandler","msg":"Scaler for scaledObject is active","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Scaler":{}}
{"level":"info","ts":"2020-09-03T07:46:32.182Z","logger":"scalehandler","msg":"Successfully updated deployment","ScaledObject.Namespace":"dev-log-forwarding","ScaledObject.Name":"lf-scaledobject-1546020352","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"dev-log-forwarding","Deployment.Name":"log-forwarding-1546020352","Original Replicas Count":0,"New Replicas Count":1}
I hope this helps.
@hila3000 if you have the chance to build the master branch of keda, you can have a look at the stackdriver client code here (https://github.com/kedacore/keda/blob/master/pkg/scalers/stackdriver_client.go).
In the method GetMetrics
perhaps add a defer s.metricsClient.Close()
. This is more a hunch as i don't know much about gcp pub sub.
Instructions to build keda is in the Readme. This may help speed up troubleshooting your issue.
Just my 2 cents. 😊
I can see, that there are timeouts when communicating with the gcp pub sub:
rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout
This is the reason for non scaling.
@hila3000 btw, you can use ``` for multiline code blocks, its better for the bigger chunks of code or logs (you can specify syntax highlighting as well: https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks
And @balchua is probably right, I am not expert on gcp pub sub, but the client should be probably closed. This might cause the problems.
@balchua haha, commented in the same time.
I think the close should be called for sure, as per the client doc: https://github.com/googleapis/google-cloud-go/blob/4de835178d583bdbbe5c8ec10befda4f59373d32/monitoring/apiv3/metric_client.go#L196-L200
I'd add the close call in here: https://github.com/kedacore/keda/blob/v2/pkg/scalers/gcp_pub_sub_scaler.go#L94
This should eliminate the timeouts?
rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post \"https://oauth2.googleapis.com/token\": dial tcp: i/o timeout
@zroubalik yeah the Close()
method. Totally forgot it. We need to have the stackdriver client part of the scaler. Right now its not stored.
@hila3000 if you have the chance to build keda master, you may want to try closing the client just to test it out.
@balchua I tried - added the close in the file, ran docker build, but when I set the new image for the keda-operator deployment, it says-
Error: failed to start container "keda-operator": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"keda\": executable file not found in $PATH": unknown
Any idea why?
@hila3000 did you do a make build
?
I did now... anything else before I hit the "docker build" command?
@hila3000 if you are building it against master
branch, all you need to do is run make publish
to build the images locally and push them to Docker hub. Or you can run make build
and run docker push
manually.
@zroubalik @balchua So let's recap. I did git clone to this repo, checkout to master did the change in pkg/scalers/stackdriver_client.go (added the defer s.metricsClient.Close() on row 48), and then tried to run "make publish" but it fails-
hfish:keda (master)$ make publish WARNING: GOROOT is not defined GO111MODULE=on CGO_ENABLED=0 GOOS=linux GOARCH=amd64 operator-sdk generate k8s /bin/sh: operator-sdk: command not found make: *** [generate-api] Error 127
Any thoughts? how do I define the GOROOT? where? why is the command not found?
Thanks a LOT for your help :)
also make build doesn't work-
hfish:keda (master)$ make build WARNING: GOROOT is not defined GO111MODULE=on CGO_ENABLED=0 GOOS=linux GOARCH=amd64 operator-sdk generate k8s /bin/sh: operator-sdk: command not found make: *** [generate-api] Error 127
it did work when I ran it on v2 branch.
OK, I can see.
You will need to install operator-sdk v0.11.0 https://github.com/operator-framework/operator-sdk/releases/tag/v0.11.0
If you have troubles with the instalation etc, if I spare some time I can give it a shot a produce an image with the change for you.
@zroubalik The files in https://github.com/operator-framework/operator-sdk/releases/tag/v0.11.0 are without files extensions.. so not sure how should it be installed. For example, I've downloaded "operator-sdk-v0.11.0-x86_64-apple-darwin" cause this looks suitable for mac, but it's not a dmg so I can't really install it.
If you can create an image for me with the relevant fix it'll be GREAT and MUCH APPRECIATED (If it can be before Saturday it'll be super helpful too).
Oh darwin
is for MacOS.
For linux try this:
# Set the release version variable
$ RELEASE_VERSION=v0.11.0
# Linux
$ curl -LO https://github.com/operator-framework/operator-sdk/releases/download/${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu
$ chmod +x operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu && sudo mkdir -p /usr/local/bin/ && sudo cp operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu /usr/local/bin/operator-sdk && rm operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu
I copied it from the operator-sdk installation guide guide
@hila3000 you can try these images:
KEDA Operator: docker.io/zroubalik/keda:gcpclose
KEDA Metrics Adapter: docker.io/zroubalik/keda-metrics-adapter:gcpclose
it is custom v1 build, which is more likely won't be released. I plan to include this change into the upcoming v2. I am not sure if it is gonna actually help with your issue, I'd recommend to switch to v2 anyway, as there were some performance improvements internally etc and those could help with your issue as well.
KEDA v2 Beta is scheduled for the beginning of the next week.
@balchua Thanks for the explanation. I do need for mac though, so it's the same concept and commands you wrote, just with the darwin files? @zroubalik Thanks! I'll deploy these versions in dev for now and see if it solves our issues. Will deploy regardless v2 beta on dev as well next week when it'll get released, and update on the results. Any estimation when v2 GA version will get released? (For Production deployment).
@zroubalik The version seems to be ok. Will give it more time to verify it, also seem to stabilize the memory usage, which is great. Question though... What does that error mean-
{"level":"error","ts":"2020-09-05T19:13:01.738Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"Could not find stackdriver metric with filter metric.type=\"pubsub.googleapis.com/subscription/num_undelivered_messages\" AND resource.labels.subscription_id=\"lf-data-1272148120-sub\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tkeda/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:89\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
The sub does exist, messages count is 0 but still, it exists.
@hila3000 great news!
Regarding that error, it seems to be benign.
The error simply suggests that the stackdriver monitoring client did not return any timeseries data within that period and for that particular metric.
@balchua @zroubalik I think it's safe to say that the fix worked. Thanks a lot! We face a different issue now- Now that it works, we've set the pollinginterval back to 5 seconds, and it introduced a quota issue-
from keda-operator pod-
{"level":"error","ts":"2020-09-06T07:57:58.138Z","logger":"gcp_pub_sub_scaler","msg":"error getting Active Status","error":"rpc error: code = ResourceExhausted desc = Quota exceeded for quota metric 'Time series queries' and limit 'Time series queries per minute' of service 'monitoring.googleapis.com' for consumer 'project_number:xxxxxxxxxxx'.","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tkeda/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive\n\tkeda/pkg/scalers/gcp_pub_sub_scaler.go:89\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment\n\tkeda/pkg/handler/scale_loop.go:119\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:45\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:28"}
from keda-metrics-apiserver pod-
E0906 09:13:37.646708 1 provider.go:94] keda_metrics_adapter/provider "msg"="error getting metric for scaler" "error"="rpc error: code = ResourceExhausted desc = Quota exceeded for quota metric 'Time series queries' and limit 'Time series queries per minute' of service 'monitoring.googleapis.com' for consumer 'project_number:xxxxxxxxxxx'." "ScaledObject.Name"="lf-scaledobject-1016483367" "ScaledObject.Namespace"="prod-us-log-forwarding" "Scaler"={}
I opened a ticket to GCP to see if we can increase the quota, but still thought to ask - Any way to handle it from Keda's side?
@balchua Regarding what you wrote- Within that period of time - But I checked the metric on GCP side and it consistently reports 0 messages. So I'm not sure why it says that the stackdriver monitoring client did not return any timeseries data.
The keda gcp pubsub scaler performs a metrics query on stackdriver within a time range (now - 2mins).
So if stackdriver doesn't return record(s) the keda gcp scaler will log that "error". But i believe it is benign from what i can tell. Does it not behave as you expect?
For now, it seems ok, no apparent issues.
Cool. Memory utilization also seems ok?
Yes :)
Glad it is solved now, closing.
@zroubalik are the fixes you did in - KEDA Operator: docker.io/zroubalik/keda:gcpclose KEDA Metrics Adapter: docker.io/zroubalik/keda-metrics-adapter:gcpclose
Also exist in keda 2.0.0 and forward?
@hila3000 yeah, the fix is present in 2.0.0 and forward.
Great, thanks!
We have a scaledobject based on GCP PubSub. The PubSub subscription has reached the number of messages to have the deployment scaled up, but it doesn't happen on time. Also, yesterday it didn't happen at all on the same scaledobject, until we restarted the keda operator- It reached 27000 messages (Scaledobject is defined to get triggered on value of 50).
Expected Behavior
Once Pubsub subscription reaches the number of messages as defined in the scaledobject, the deployment needs to get scaled up from 0->N, based on the polling interval of course.
Actual Behavior
First issue- Doesn't scale on time- Deployment got scaled to 0 on timestamp 1598771940.466385 (07:19:00.466 AM GMT). Pubsub messages show an increase over 50 messages (the value that's defined in the scaledobject) on 07:21 AM GMT, however at 07:38 GMT it reached 972 messages and only then it got scaled up, a.k.a after 17 minutes of a high count of messages (relatively to the scaledobject configuration).
I would expect it to scale up back to N replicas after 7 minutes the latest - 2 min of polling interval + 5 minutes of the cooling period.
Second issue - Didn't scale at all- Yesterday it didn't scale back up at all (after almost 4 hours of high count of messages) until we restarted the Keda operator. Messages count on the subscription started increasing since 13:56 GMT.
Deployment got scaled to 0 on timestamp 1598709286.193081 (13:54:46.193 PM GMT). Pubsub messages show an increase over 50 messages (the value that's defined in the scaledobject) on 13:56 PM GMT, however at 17:55 GMT it reached 27,063 messages and only after we restarted Keda operator on 17:53 PM GMT, it scaled the pods up to 1 replica.
I would expect it to scale up back to N replicas at 14:03 PM GMT the latest - 2 min of polling interval + 5 minutes of the cooling period. BUT it didn't scale at all.
LOGS- First issue- Doesn't scale on time- From Keda logs- scaling to 0 on 7:19 AM GMT-
{"level":"info","ts":1598771940.466385,"logger":"scalehandler","msg":"Successfully scaled deployment to 0 replicas","ScaledObject.Namespace":"prod-us-log-forwarding","ScaledObject.Name":"lf-scaledobject-535176920","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"prod-us-log-forwarding","Deployment.Name":"log-forwarding-535176920"}
Any other references to this scaledobject after that-
{"level":"info","ts":1598771965.0714204,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"prod-us-log-forwarding","Request.Name":"lf-scaledobject-535176920"} {"level":"info","ts":1598771965.071469,"logger":"controller_scaledobject","msg":"Detected ScaleType = Deployment","Request.Namespace":"prod-us-log-forwarding","Request.Name":"lf-scaledobject-535176920"}
we have more of those reconciling...detected... and then at 07:38:50.164 AM-
{"level":"info","ts":1598773130.1646843,"logger":"scalehandler","msg":"Successfully updated deployment","ScaledObject.Namespace":"prod-us-log-forwarding","ScaledObject.Name":"lf-scaledobject-535176920","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"prod-us-log-forwarding","Deployment.Name":"log-forwarding-535176920","Original Replicas Count":0,"New Replicas Count":1}
Nothing in between.
Second issue- Doesn't scale at all- From Keda logs- As for yesterday in which it didn't scale at all, only relevant errors I see in the logs are- 13:55:13 GMT- Worth mentioning this error is noticed on the logs also right before.. a.k.a right before it got successfully scaled to 0.
error: "rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": dial tcp: i/o timeout" level: "error" logger: "gcp_pub_sub_scaler" msg: "error getting Active Status" stacktrace: "github.com/go-logr/zapr.(*zapLogger).Error /Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128 github.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive keda/pkg/scalers/gcp_pub_sub_scaler.go:88 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment keda/pkg/handler/scale_loop.go:119 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale keda/pkg/handler/scale_loop.go:45 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop keda/pkg/handler/scale_loop.go:28" ts: 1598720729.596498
14:10:04 GMT -error: "rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": unexpected EOF" level: "error" logger: "gcp_pub_sub_scaler" msg: "error getting Active Status" stacktrace: "github.com/go-logr/zapr.(*zapLogger).Error /Users/zroubali/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128 github.com/kedacore/keda/pkg/scalers.(*pubsubScaler).IsActive keda/pkg/scalers/gcp_pub_sub_scaler.go:88 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleDeployment keda/pkg/handler/scale_loop.go:119 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale keda/pkg/handler/scale_loop.go:45 github.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop keda/pkg/handler/scale_loop.go:28" ts: 1598710204.4026008
These 2 errors types gets reported quite often.
Steps to Reproduce the Problem
I don't know how this can be reproduced, but I'll just share the configurations I have.
Scaledobject- (Don't mind the wrong indentation.. Happened when pasted here) apiVersion: keda.k8s.io/v1alpha1 kind: ScaledObject metadata: creationTimestamp: "2020-08-23T10:26:34Z" finalizers: finalizer.keda.k8s.io generation: 2 labels: deploymentName: log-forwarding-535176920 name: lf-scaledobject-535176920 namespace: prod-us-log-forwarding resourceVersion: "85343895" selfLink: /apis/keda.k8s.io/v1alpha1/namespaces/prod-us-log-forwarding/scaledobjects/lf-scaledobject-535176920 uid: 1e802b1a-e52b-11ea-8145-42010ab50004 spec: cooldownPeriod: 300 maxReplicaCount: 5 minReplicaCount: 0 pollingInterval: 120 scaleTargetRef: deploymentName: log-forwarding-535176920 triggers: metadata: credentials: GENERIC_GOOGLE_CREDENTIALS subscriptionName: lf-data-535176920-sub subscriptionSize: "50" type: gcp-pubsub
Specifications