openzipkin / zipkin-support

repository for support questions raised as issues
4 stars 2 forks source link

Elasticsearch storage and distributed Zipkin components. Collectors and UI #29

Closed carlosjgp closed 4 years ago

carlosjgp commented 5 years ago

Describe the Bug

Hi! I'm trying to find some documentation about this error on Zipkin UI

ERROR: cannot load service names: Request processing failed; nested exception is java.lang.IllegalStateException: response for aggregation failed: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [localEndpoint.serviceName] in order to load fielddata in memory by uninverting the inverted index......

I use Kubernetes to deploy Zipkin: 2 collectors openzipkin/zipkin:2.12.0 QUERY_ENABLED: false 1 UI openzipkin/zipkin:2.12.0 QUERY_ENABLED: true 1 Depencencies openzipkin/zipkin-dependencies2.0.4 CronJob 1 ElasticSearch zipkin-elasticsearch6:2.12.0

It works for a while... and then starts showing this error

Steps to Reproduce

Expected Behaviour

UI can query Elasticsearch index

Conversation on Gitter:

Brian Devins-Suresh @devinsba Feb 11 17:00. @carlosjgp please try zipkin:2.12.1, there was at least one bug fix to ES in that release

Carlos Juan Gómez Peñalver @carlosjgp Feb 11 17:02. @devinsba Thanks I'll come back with the results tomorrow after running the dependencies job too

Carlos Juan Gómez Peñalver @carlosjgp Feb 11 17:22. @devinsba It's not working. I updated the collectors, ui and ES node to use 2.12.1 and deleted the ES storage to ensure that old data was not interfering. Still the same error. Does this ES template need to be updated?

Brian Devins-Suresh @devinsba Feb 11 17:42. You said it works for a while and then stops, that to me means your data might be causing the issue. Could you give me some sample service names? I’m wondering if you have a character in your names that ES is interpreting oddly

Carlos Juan Gómez Peñalver @carlosjgp Feb 11 18:22. @devinsba I've "fixed" by adding the fielddata to localEndpoint.serviceName, name and traceId. Gist. @devinsba After trying a couple of times I'm not sure if it depends on how much data has ingested or indexed... After applying the previous query everything works. Should this fielddata property be managed by Zipkin or Elasticsearch? Is because I'm running a single node ES cluster?

Adrian Cole @adriancole 00:11. @carlosjgp so this is strange.. the index template shouldn't have sporadic behavior for sure. I wonder if there originally was a bad run of the index template, as the fields needed are added by default. Probably what happened is that the QUERY_ENABLED=false created the search disabled index template. Probably we should mention that if you are only disabling search on collectors, you should also disable automatic schema creation, as otherwise you will prevent the ability to search. I can see how this could be confusing.. there's some tension around least config needed and conflation. For example, most who are disabling search are disabling it for the whole site Maybe you can raise an issue.. while we haven't heard this reported before, it is important. meanwhile, please start your query server first, if you are disabling search on the collector. you need to delete the index template first

codefromthecrypt commented 5 years ago

I think the problem we have is that QUERY_ENABLED shouldn't imply a change to SEARCH_ENABLED. SEARCH_ENABLED should be a site-wide setting, whereas QUERY_ENABLED should be something each node can change. https://github.com/openzipkin/zipkin/tree/master/zipkin-server#environment-variables

codefromthecrypt commented 5 years ago

I've looked and QUERY_ENABLED has no bearing on SEARCH_ENABLED (what controls the index template). These are controlled independently.

It might be the case that some race condition is going on in your environment. I will try what you've done locally anyway

codefromthecrypt commented 5 years ago

Sorry I asked to open this issue.. when it was being discussed I was on my mobile and it was hard to remember what controlled what. QUERY_ENABLED has no bearing on SEARCH_ENABLED, and only SEARCH_ENABLED would cause the template to be created incorrectly. What may have happened is you had some race lost but it is impossible to guess.

When running with QUERY_ENABLED=true, I verified the index template is still created. Note: our storage images are test-only.. they are not intended for production.

curl -s localhost:9200/zipkin:span-2019-02-18/_mapping|jq .
{
  "zipkin:span-2019-02-18": {
    "mappings": {
      "_default_": {
        "dynamic_templates": [
          {
            "strings": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "ignore_above": 256,
                "norms": false,
                "type": "keyword"
              }
            }
          }
        ]
      },
      "span": {
        "_source": {
          "excludes": [
            "_q"
          ]
        },
        "dynamic_templates": [
          {
            "strings": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "ignore_above": 256,
                "norms": false,
                "type": "keyword"
              }
            }
          }
        ],
        "properties": {
          "_q": {
            "type": "keyword"
          },
          "annotations": {
            "type": "object",
            "enabled": false
          },
          "duration": {
            "type": "long"
          },
          "id": {
            "type": "keyword",
            "ignore_above": 256
          },
          "kind": {
            "type": "keyword",
            "ignore_above": 256
          },
          "localEndpoint": {
            "dynamic": "false",
            "properties": {
              "serviceName": {
                "type": "keyword"
              }
            }
          },
          "name": {
            "type": "keyword"
          },
          "parentId": {
            "type": "keyword",
            "ignore_above": 256
          },
          "remoteEndpoint": {
            "dynamic": "false",
            "properties": {
              "serviceName": {
                "type": "keyword"
              }
            }
          },
          "shared": {
            "type": "boolean"
          },
          "tags": {
            "type": "object",
            "enabled": false
          },
          "timestamp": {
            "type": "long"
          },
          "timestamp_millis": {
            "type": "date",
            "format": "epoch_millis"
          },
          "traceId": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

If you want to troubleshoot further, you can use a similar POST request if you get the error on a real image. For now, I'm closing this issue as we don't use issues for troubleshooting unless there will be code change.