jaegertracing / helm-charts

Helm Charts for Jaeger backend
Apache License 2.0
269 stars 347 forks source link

[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

Open Stevenpc3 opened 6 months ago

Stevenpc3 commented 6 months ago

What happened?

As a user of the "system architecture" tab in Jaeger I would like to use Spark to generate the diagrams.

But spark no longer works with new charts that use elasticsearch 8+

Spark job throws logs stating that it requires elasticsearch 7 and hadoop.

Steps to reproduce

  1. Use the new charts to deploy using elasticsearch 8+
  2. produce traces that go to elasticsearch
  3. run a spark job
  4. check logs of the spark job for errors

Expected behavior

spark job completes as it used to

Relevant log output

24/05/22 15:41:20 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2024-05-22T00:00Z, reading from jaeger-span-2024-05-22 index, result storing to jaeger-dependencies-2024-05-22
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
        at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:236)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:212)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version [8.13.2].Highest supported version is [7.x]. You may need to upgrade ES-Hadoop.
        at org.elasticsearch.hadoop.util.EsMajorVersion.parse(EsMajorVersion.java:91)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:746)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
        ... 33 more

Screenshot

No response

Additional context

No response

Jaeger backend version

3.0.7

SDK

No response

Pipeline

No response

Stogage backend

Elasticsearch 8+

Operating system

Linux

Deployment model

Kubernetes

Deployment configs

# -- enable or disable Jaeger
enabled: true

storage:
  type: elasticsearch
  elasticsearch:
    # make this a template that decides based on devMode and can configure properly
    host: "jaeger-elasticsearch"
    usePassword: false
    antiAffinity: "soft"

# -- Preferred long term backend storage
elasticsearch:
  master:
    masterOnly: false
    replicaCount: 1
    lifecycleHooks:
      postStart:
        exec:
          command:
            - bash
            - -c
            - |
              #!/bin/bash
              # Add a template to adjust number of shards/replicas
              TEMPLATE_NAME=no_replicas
              # INDEX_PATTERN1="jaeger-span-*"
              # INDEX_PATTERN2="jaeger-service-*"
              INDEX_PATTERN1="jaeger-dependencies-*"
              ES_URL=http://localhost:9200
              while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
              curl -XPUT "$ES_URL/_index_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"'],"template":{"settings":{"number_of_replicas":"0"}}}'
  data:
    replicaCount: 0
  coordinating:
    replicaCount: 0
  ingest:
    replicaCount: 0
  fullnameOverride: "jaeger-elasticsearch"
  volumeClaimTemplate:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 3Gi

# -- For support with older Trace formats
agent:
    enabled: false

# -- The backend storage type to use
provisionDataStore:
  cassandra: false
  elasticsearch: true
  kafka: false

# -- The service that collects and serves trace information
collector:
  service:
    otlp:
      grpc:
        port: 4317
        name: oltp-grpc
      http:
        port: 4318
        name: oltp-http
  cmdlineParams:
    es.num-replicas: "0"

# -- The Jaeger UI service
query:
  agentSidecar:
    enabled: false
  # -- This should start with a /
  basePath: /jaeger

# Jaeger Spark job to generate the system architecture
spark:
  enabled: true
  schedule: "00 21 * * *"
Stevenpc3 commented 6 months ago

Need to update the registry in the chart based on this comment. https://github.com/jaegertracing/helm-charts/issues/532#issuecomment-2124734512

Make the correct registry part of the chart

Stevenpc3 commented 5 months ago

@dpericaxon @yurishkuro Why is sparkdependencies hosted in github and the rest are hosted on docker? https://github.com/orgs/jaegertracing/packages

That is a bit confusing especially since there is on on docker that is claimed to be outdated via https://github.com/jaegertracing/spark-dependencies/issues/137#issuecomment-2119746686

yurishkuro commented 5 months ago

I don't know how/why that decision was made. I agree it would've been better to use the same Docker and Quay hosting we use for other images.

yurishkuro commented 5 months ago

I updated the readme for spark-dependencies. I think this Helm chart should be pointing to a different location too: https://github.com/jaegertracing/helm-charts/blob/f4213e24f1ff4b41f04b5087ab1c912fd0275751/charts/jaeger/values.yaml#L781

Stevenpc3 commented 5 months ago

Yeah that link to the values is what I meant by https://github.com/jaegertracing/helm-charts/issues/574#issuecomment-2125946735

I can make a PR. I think just setting the registry and repo to default to the ghcr.io will be fine. I did this locally since we use the global.imageRegistry. Then it will work out if the box for others.

sergeykad commented 3 weeks ago

I fixed it by adding the registry and repository to my configuration as follows. Could you merge this into the official Helm charts?

spark:
  enabled: true
  image:
    registry: ghcr.io
    repository: jaegertracing/spark-dependencies/spark-dependencies