Closed jsw closed 7 years ago
@sduskis Any ideas on what could be causing the cancellation, or what I can do to help troubleshoot this? Unfortunately, I don't have an easy repro. It must be a race condition or timeout issue or similar. The issue must be on the client and not the server, given we just started seeing this issue. Our access pattern is to have several worker threads continually scanning for and reading individual rows from Bigtable. Unfortunately, we will have to rollback to our previous known good release, which depends on bigtable-hbase-1.2:0.9.4
, and pins us to using an alpha release of the google pubsub library so that the netty dependencies align. Let me know what additional info I can help provide.
We're not completely sure about the root cause, but have some suspicions that it might be related to the version of grpc that's in your classpath. Given some googling, we see that CancellationException: Task was cancelled.
seems to come from Futures.immediateCancelledFuture()
. However,t we don't see a path that would call that method
Can you please run mvn dependency:tree
and add the contents to this issue?
@sduskis I'm using sbt, which I know has some interop issues with maven projects. Do you need the full tree (5000+ lines) or is a flat list sufficient?
5000+ lines? Wow. I think that the most important dependency is guava, so let's start with that.
Here's the full list (internal dependencies removed)
ai.api:libai:1.4.8
ch.qos.logback:logback-classic:1.2.3
ch.qos.logback:logback-core:1.2.3
com.chuusai:shapeless_2.11:2.3.0
com.fasterxml.jackson.core:jackson-annotations:2.8.9
com.fasterxml.jackson.core:jackson-core:2.8.9
com.fasterxml.jackson.core:jackson-databind:2.8.9
com.fasterxml.jackson.datatype:jackson-datatype-jdk8:2.8.9
com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.8.9
com.fasterxml.jackson.module:jackson-module-paranamer:2.8.8
com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8
com.github.stephenc.findbugs:findbugs-annotations:1.3.9-1
com.google.api:api-common:1.1.0
com.google.api:gax:1.4.1
com.google.api:gax-grpc:0.21.1
com.google.api-client:google-api-client:1.22.0
com.google.api.grpc:grpc-google-cloud-pubsub-v1:0.1.12
com.google.api.grpc:grpc-google-common-protos:0.1.9
com.google.api.grpc:proto-google-cloud-pubsub-v1:0.1.12
com.google.api.grpc:proto-google-common-protos:0.1.12
com.google.api.grpc:proto-google-iam-v1:0.1.12
com.google.apis:google-api-services-monitoring:v3-rev10-1.22.0
com.google.apis:google-api-services-storage:v1-rev100-1.22.0
com.google.auth:google-auth-library-appengine:0.6.1
com.google.auth:google-auth-library-credentials:0.7.0
com.google.auth:google-auth-library-oauth2-http:0.7.0
com.google.auto.value:auto-value:1.2
com.google.cloud:google-cloud-core:1.2.1
com.google.cloud:google-cloud-core-grpc:1.2.1
com.google.cloud:google-cloud-core-http:1.2.1
com.google.cloud:google-cloud-pubsub:0.20.1-beta
com.google.cloud:google-cloud-storage:1.2.1
com.google.cloud.bigtable:bigtable-client-core:1.0.0-pre1
com.google.cloud.bigtable:bigtable-hbase:1.0.0-pre1
com.google.cloud.bigtable:bigtable-hbase-1.x:1.0.0-pre1
com.google.cloud.bigtable:bigtable-protos:1.0.0-pre1
com.google.code.findbugs:jsr305:3.0.2
com.google.code.gson:gson:2.7
com.google.errorprone:error_prone_annotations:2.0.19
com.google.guava:guava:22.0
com.google.http-client:google-http-client:1.22.0
com.google.http-client:google-http-client-appengine:1.21.0
com.google.http-client:google-http-client-jackson:1.21.0
com.google.http-client:google-http-client-jackson2:1.22.0
com.google.instrumentation:instrumentation-api:0.4.2
com.google.j2objc:j2objc-annotations:1.1
com.google.oauth-client:google-oauth-client:1.22.0
com.google.protobuf:protobuf-java:3.3.1
com.google.protobuf:protobuf-java-util:3.3.1
com.markatta:futiles_2.11:2.0.0
com.netaporter:scala-uri_2.11:0.4.16
com.newrelic.agent.java:newrelic-api:3.33.0
com.thoughtworks.paranamer:paranamer:2.8
com.typesafe:config:1.3.1
com.typesafe:ssl-config-core_2.11:0.2.2
com.typesafe.akka:akka-actor_2.11:2.5.3
com.typesafe.akka:akka-http-core_2.11:10.0.9
com.typesafe.akka:akka-parsing_2.11:10.0.9
com.typesafe.akka:akka-slf4j_2.11:2.5.3
com.typesafe.akka:akka-stream_2.11:2.5.3
com.typesafe.play:build-link:2.6.1
com.typesafe.play:cachecontrol_2.11:1.1.2
com.typesafe.play:filters-helpers_2.11:2.6.1
com.typesafe.play:play-json_2.11:2.6.2
com.typesafe.play:play-logback_2.11:2.6.1
com.typesafe.play:play-netty-utils:2.6.1
com.typesafe.play:play-server_2.11:2.6.1
com.typesafe.play:play-streams_2.11:2.6.1
com.typesafe.play:play-ws-standalone-json_2.11:1.0.1
com.typesafe.play:play-ws-standalone-xml_2.11:1.0.1
com.typesafe.play:play-ws-standalone_2.11:1.0.1
com.typesafe.play:play-ws_2.11:2.6.1
com.typesafe.play:play_2.11:2.6.1
com.typesafe.play:shaded-asynchttpclient:1.0.1
com.typesafe.play:shaded-oauth:1.0.1
com.typesafe.play:twirl-api_2.11:1.3.3
com.typesafe.scala-logging:scala-logging_2.11:3.5.0
commons-codec:commons-codec:1.10
commons-logging:commons-logging:1.2
io.dropwizard.metrics:metrics-core:3.1.2
io.grpc:grpc-auth:1.4.0
io.grpc:grpc-context:1.4.0
io.grpc:grpc-core:1.4.0
io.grpc:grpc-netty:1.4.0
io.grpc:grpc-protobuf:1.4.0
io.grpc:grpc-protobuf-lite:1.4.0
io.grpc:grpc-stub:1.4.0
io.jsonwebtoken:jjwt:0.7.0
io.netty:netty-buffer:4.1.11.Final
io.netty:netty-codec:4.1.11.Final
io.netty:netty-codec-http:4.1.11.Final
io.netty:netty-codec-http2:4.1.11.Final
io.netty:netty-codec-socks:4.1.11.Final
io.netty:netty-common:4.1.11.Final
io.netty:netty-handler:4.1.11.Final
io.netty:netty-handler-proxy:4.1.11.Final
io.netty:netty-resolver:4.1.11.Final
io.netty:netty-tcnative-boringssl-static:2.0.3.Final
io.netty:netty-transport:4.1.11.Final
io.spray:spray-json_2.11:1.3.2
javax.cache:cache-api:1.0.0
javax.inject:javax.inject:1
javax.transaction:jta:1.1
joda-time:joda-time:2.9.9
junit:junit:4.12
org.apache.commons:commons-lang3:3.6
org.apache.commons:commons-math3:3.6.1
org.apache.hbase:hbase-shaded-client:1.3.1
org.apache.htrace:htrace-core:3.1.0-incubating
org.apache.httpcomponents:httpclient:4.0.1
org.apache.httpcomponents:httpcore:4.0.1
org.apache.logging.log4j:log4j-api:2.2
org.apache.logging.log4j:log4j-core:2.2
org.codehaus.jackson:jackson-core-asl:1.9.11
org.codehaus.mojo:animal-sniffer-annotations:1.14
org.hamcrest:hamcrest-core:1.3
org.joda:joda-convert:1.7
org.json:json:20160810
org.json4s:json4s-ast_2.11:3.5.2
org.json4s:json4s-core_2.11:3.5.2
org.json4s:json4s-jackson_2.11:3.5.2
org.json4s:json4s-scalap_2.11:3.5.2
org.parboiled:parboiled_2.11:2.1.2
org.reactivestreams:reactive-streams:1.0.0
org.scala-lang:scala-reflect:2.11.11
org.scala-lang.modules:scala-java8-compat_2.11:0.8.0
org.scala-lang.modules:scala-parser-combinators_2.11:1.0.6
org.scala-lang.modules:scala-xml_2.11:1.0.6
org.slf4j:jcl-over-slf4j:1.7.25
org.slf4j:jul-to-slf4j:1.7.25
org.slf4j:log4j-over-slf4j:1.7.21
org.slf4j:slf4j-api:1.7.25
org.threeten:threetenbp:1.3.3
org.typelevel:macro-compat_2.11:1.1.1
Since we don't have concrete information, we can release a -SNAPSHOT with more logging that may get to a root cause.
Please note that in the snapshot, we added a default dependency on an older version of tcnative, so please consider an explicit exclusion on our version of tcnative.
The updated 1.0.0-pre2-SNAPSHOT
is out. Can you please see if it provides additional information about the cause of the
FYI, I think you can add the ability to use snapshot repo with these instructions. I usually use maven to do this configuration, like in this example.
By the way, thanks for providing the list of dependencies. Your use of guava 22.0 does explain the "Task was cancelled" message. Guava 19, which we use by default, had different messages, which lead to some confusion on our end.
@sduskis Here are the relevant portions of the stack traces with 1.0.0-pre2-SNAPSHOT
2017-07-13 23:57:55,520 [ERROR] XXX:67 Table xxx: Other exception thrown: Failed to perform operation. Operation='get', projectId='xxx', tableName='xxx', rowKey='xxx'
java.io.IOException: Failed to perform operation. Operation='get', projectId='xxx', tableName='xxx', rowKey='xxx'
at com.google.cloud.bigtable.hbase.BigtableTable.logAndCreateIOException(BigtableTable.java:528)
at com.google.cloud.bigtable.hbase.BigtableTable.getResults(BigtableTable.java:269)
at com.google.cloud.bigtable.hbase.BigtableTable.get(BigtableTable.java:255)
Caused by: java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1123)
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:501)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:380)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:86)
at com.google.cloud.bigtable.hbase.BigtableTable.getResults(BigtableTable.java:264)
... 39 common frames omitted
2017-07-13 23:57:55,519 [ERROR] c.g.c.b.h.BigtableTable:143 Encountered exception when executing get.
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1123)
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:501)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:380)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:86)
at com.google.cloud.bigtable.hbase.BigtableTable.getResults(BigtableTable.java:264)
at com.google.cloud.bigtable.hbase.BigtableTable.get(BigtableTable.java:255)
Just to confirm: there are no logs of "Could not sendMessage" or "Could not halfClose" preceding those traces?
I don't see either of those messages. Do I need to change log threshold to be below ERROR
?
You shouldn't need to change the threshold. Here's the code we were thought might be triggering the cancel.
I'll try something else, which might help identify the problem.
PR #1464 may possibly fix this issue. I deployed a new SNAPSHOT with that change. Can you please retry?
@sduskis After 2-hour load test, I do not see the error anymore 👍
@sduskis Thanks again for the fix. I assume this will make it in the 1.0.0-pre2
release? Is there an ETA?
I'm aiming to get it out by Wednesday
We have to revisit this issue before we release a new client. Commit #1464 can cause some problems.
@sduskis I think you meant Commit #1467 in your previous comment? Looks like PR #1465 is related to Issue #1445 (empty values). And thanks again for working through these issues.
I updated the comment to #1464.
I just released the 1.0.0-pre2 version.
@sduskis Thanks so much for all the fixes. After several load tests and running in production for several days, I have not seen any of the previous issues yet with 1.0.0-pre2.
hey @sduskis i'm facing a similar error. would be really grateful if you could please tell me how to resolve it. the stack trace is as follows:
java.io.IOException: Failed to perform operation. Operation='getScanner', projectId='*, tableName='***'
at com.google.cloud.bigtable.hbase.AbstractBigtableTable.getScanner(AbstractBigtableTable.java:315)
at com..scheduler.hbase.ScheduledJobRepository.getJobsDue(ScheduledJobRepository.java:176)
at com..scheduler.hbase.ScheduledJobRepository.getJobsDue(ScheduledJobRepository.java:154)
at com..scheduler.hbase.ScheduledJobRepository.getJobsDue(ScheduledJobRepository.java:91)
at com..scheduler.hbase.ScheduledJobRepository$FastClassBySpringCGLIB$ba59bd96.invoke(
Since upgrading to
com.google.cloud.bigtable:bigtable-hbase-1.x:1.0.0-pre1
I am occasionally seeing the following error. I've redacted some confidential elements of the stack trace. Any idea what may be causing it?