googleapis / java-bigtable

Apache License 2.0
70 stars 86 forks source link

Bigtable from Dataproc: Dependency conflict even after shading the jars #1789

Closed shril closed 10 months ago

shril commented 1 year ago

I am trying to run a Spark Application to write and read data to Cloud Bigtable from Dataproc.

Initially, I got this exception java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. Then came to know that there are some dependency issues from this Google Documentation - Manage Java and Scala dependencies for Apache Spark.

Following the instructions, I changed my build.sbt file to shade the jars -

assembly / assemblyShadeRules := Seq(
  ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
  ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
  ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)

Then got this error

repackaged.io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
  at repackaged.io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
  at repackaged.io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:353)
  at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:107)
  at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:85)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:237)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:231)
  at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201)
  at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create(EnhancedBigtableStub.java:175)
  at com.google.cloud.bigtable.data.v2.BigtableDataClient.create(BigtableDataClient.java:165)
  at com.groupon.crm.BigtableClient$.getDataClient(BigtableClient.scala:59)
  ... 44 elided

Following that, I added the dependency of in my build.sbt file.

libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Still I am getting the same error.

Environment details

Dataproc details -

"software_config": {
      "image_version": "1.5-debian10",
      "properties": {
        "dataproc:dataproc.logging.stackdriver.job.driver.enable": "true",
        "dataproc:dataproc.logging.stackdriver.enable": "true",
        "dataproc:jobs.file-backed-output.enable": "true",
        "dataproc:dataproc.logging.stackdriver.job.yarn.container.enable": "true",
        "capacity-scheduler:yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
        "hive:hive.server2.materializedviews.cache.at.startup": "false",
        "spark:spark.jars":"XXXX"
      },
      "optional_components": ["ZEPPELIN","ANACONDA","JUPYTER"]
    }

Spark Job details -

val sparkVersion = "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "com.google.cloud" % "google-cloud-bigtable" % "2.23.1"
libraryDependencies += "com.google.auth" % "google-auth-library-oauth2-http" % "1.17.0"
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Can provide any additional details if required? Thanks!

igorbernstein2 commented 10 months ago

You need to make sure that you are updating the service files when repackaging grpc. In Maven you would use something like: https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

However I'm uncertain whats the equivalent for sbt, but the resulting shaded jar needs to have a META-INF/services files that are correctly updated to with the repackaged class names

shril commented 10 months ago

@igorbernstein2 I did the following and it worked perfectly well for me.

In src/main/resources I added META-INF/services folder. In the services folder I added 2 files namely

The content of both the files are as follows -

io.grpc.LoadBalancerProvider

io.grpc.internal.PickFirstLoadBalancerProvider
io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider
io.grpc.util.OutlierDetectionLoadBalancerProvider

and io.grpc.NameResolverProvider

io.grpc.internal.DnsNameResolverProvider