jaegertracing / spark-dependencies

Spark job for dependency links
http://jaegertracing.io/
Apache License 2.0
122 stars 69 forks source link

Support for Elasticsearch 8.x #125

Closed sylvainOL closed 6 months ago

sylvainOL commented 2 years ago

Describe the bug

Today Spark dependencies only work with elasticsearch 7.x

To Reproduce Steps to reproduce the behavior:

  1. install elasticsearch 8.x
  2. launch spark dependencies

we have this in logs:

Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:348)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
        at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
        at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
        at scala.collection.immutable.List.map(List.scala:298)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
        at org.apache.spark.rdd.RDD.$anonfun$groupBy$1(RDD.scala:714)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:714)
        at org.apache.spark.api.java.JavaRDDLike.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.JavaRDDLike.groupBy$(JavaRDDLike.scala:239)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:273)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:249)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version [8.2.3].Highest supported version is [7.x]. You may need to upgrade ES-Hadoop.
        at org.elasticsearch.hadoop.util.EsMajorVersion.parse(EsMajorVersion.java:91)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:756)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:338)
        ... 31 more

Expected behavior It would be nice to make it work with ES 8.x (Jaeger is "supporting" it)

Version (please complete the following information):

Additional context

maybe by using this in pom.xml:

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-spark-20_${version.scala.binary}</artifactId>
    <version>8.2.3</version>
</dependency>
vmaleze commented 7 months ago

For those of you who want to use this with elastic 8, here is the repo with the artefact built using the PR of @sylvainOL => https://github.com/vmaleze/spark-dependencies-es8/pkgs/container/spark-dependencies-es8 Unfortunately, the tests fails on the latest-jaeger stage and I have issues testing this on arm64. So I cannot fix it to submit a PR and merge this in the official repo. However, the forked repo works like a charm for elastic 8.