Closed weeco closed 4 years ago
Please, check the troubleshooting guide: https://www.jaegertracing.io/docs/1.14/troubleshooting/
Well there's nothing related to debugging the Query UI which cannot find these spans and traces in Elasticsearch. Most of the guide focuses on the write path, which however works for me.
Sorry, looks like I read your question a bit too fast. To be honest, I have not seen this before. The only thing I could offer right now is to double-check your connection settings, comparing the one from Query with the one from the Collector.
You could also post the logs from your collector and query, perhaps we can spot something that you haven't...
I believe the elasticsearch configuration is fine and it can also connect to Elasticsearch. When I use a wrong config the query service does log that on startup like this:
{"level":"info","ts":1572364318.4869714,"caller":"flags/service.go:115","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1572364318.4872856,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1572364318.487388,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":16687}
{"level":"info","ts":1572364318.487408,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":16687,"health-status":"unavailable"}
{"level":"fatal","ts":1572364324.5567038,"caller":"query/main.go:88","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: Head https://logging-es-http.elastic-system.svc:9201: context deadline exceeded","errorVerbose":"Head https://logging-es-http.elastic-system.svc:9201: context deadline exceeded\nfailed to create primary Elasticsearch client\ngithub.com/jaegertracing/jaeger/plugin/storage/es.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/factory.go:83\ngithub.com/jaegertracing/jaeger/plugin/storage.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/factory.go:108\nmain.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/query/main.go:87\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/query/main.go:130\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.1.linux.amd64/src/runtime/proc.go:200\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.12.1.linux.amd64/src/runtime/asm_amd64.s:1337","stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/query/main.go:88\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/query/main.go:130\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.1.linux.amd64/src/runtime/proc.go:200"}
With the correct ElasticSearch settings the log looks fine:
kubectl logs -f jaeger-query-74db5fd5c5-l2cr6
2019/10/29 16:03:37 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
{"level":"info","ts":1572365017.8689969,"caller":"flags/service.go:115","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1572365017.8692174,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1572365017.8693018,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":16687}
{"level":"info","ts":1572365017.869334,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":16687,"health-status":"unavailable"}
{"level":"info","ts":1572365017.9112318,"caller":"config/config.go:172","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1572365017.9125092,"caller":"healthcheck/handler.go:130","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1572365017.91256,"caller":"app/server.go:135","msg":"Starting CMUX server","port":16686}
{"level":"info","ts":1572365017.9125957,"caller":"app/server.go:112","msg":"Starting HTTP server","port":16686}
{"level":"info","ts":1572365017.9126284,"caller":"app/server.go:125","msg":"Starting GRPC server","port":16686}
I tend to believe that the query UI is for some reason not able to see/query the data in Elasticsearch (maybe because I am using ES v7?).
Collector logs:
kubectl logs -f jaeger-collector-556598c676-8qs6c
2019/10/29 12:42:47 maxprocs: Updating GOMAXPROCS=2: determined from CPU quota
{"level":"info","ts":1572352967.9401166,"caller":"flags/service.go:115","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1572352967.94039,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1572352967.9405358,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":14269}
{"level":"info","ts":1572352967.9405482,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":14269,"health-status":"unavailable"}
{"level":"info","ts":1572352968.021824,"caller":"config/config.go:172","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1572352968.6172075,"caller":"static/strategy_store.go:79","msg":"No sampling strategies provided, using defaults"}
{"level":"info","ts":1572352968.6176662,"caller":"collector/main.go:128","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1572352968.6177654,"caller":"grpcserver/grpc_server.go:102","msg":"Starting jaeger-collector gRPC server","grpc-port":"14250"}
{"level":"info","ts":1572352968.617974,"caller":"collector/main.go:147","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1572352968.618001,"caller":"healthcheck/handler.go:130","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1572352968.6600127,"caller":"collector/main.go:242","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}
Kubernetes deployment of query component:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: jaeger
component: query
name: jaeger-query
namespace: jaeger
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: jaeger
component: query
strategy:
type: Recreate
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "16687"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: jaeger
component: query
namespace: default
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- jaeger
- key: component
operator: In
values:
- query
topologyKey: failure-domain.beta.kubernetes.io/zone
weight: 100
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- jaeger
- key: component
operator: In
values:
- query
topologyKey: kubernetes.io/hostname
weight: 20
automountServiceAccountToken: false
containers:
- env:
- name: QUERY_BASE_PATH
value: /
- name: SPAN_STORAGE_TYPE
value: elasticsearch
- name: ES_SERVER_URLS
value: https://logging-es-http.elastic-system.svc:9200
- name: LOG_LEVEL
value: debug
- name: ES_TLS_CA
value: /etc/jaeger/elasticsearch-certs/ca.crt
image: jaegertracing/jaeger-query:1.14.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: health
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: query
ports:
- containerPort: 16686
name: ui
protocol: TCP
- containerPort: 16687
name: health
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: health
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 100m
memory: 500Mi
volumeMounts:
- mountPath: /etc/jaeger/elasticsearch-certs
name: elasticsearch-certs
dnsPolicy: ClusterFirst
nodeSelector:
cloud.google.com/gke-nodepool: default-v2
securityContext: {}
volumes:
- name: elasticsearch-certs
secret:
defaultMode: 420
optional: false
secretName: logging-es-http-certs-public
@kevinearls, @pavolloffay, does it ring a bell?
@jpkrohling Not that I've seen, sorry. I haven't ever used ES v7 though, so maybe that has something to do with it.
If the Jaeger services started well it prooves the connection to Elasticsearch is fine.
The issue seems to be in reporting spans.
@weeco how do you report spans to Jaeger?
@pavolloffay Jaeger traces are reported by Cortex v0.3.0 (https://github.com/cortexproject/cortex) and I run everything in Kubernetes. I have a daemonset for the jaeger agents, and a deployment for the jaeger collector and query.
Just in case you've missed it: I ensured that spans and traces actually land in Elasticsearch (see Screenshot).
I could share the deployment manifests of all involved components (jaeger agent, daemonset and collector) if you think this could help?
Side note: I had Jaeger working (using pretty much the same manifests) with Elasticsearch version 6.8 and jaeger v.1.12
It indeed seems like a problem on the query side.
I have tried ES 7.4 with all-in-one 1.14 and I was able to see traces from jaeger-query (it traces itself).
docker run --rm -it -e SPAN_STORAGE_TYPE=elasticsearch -e ES_SERVER_URLS=http://elasticsearch:9200 --link elasticsearch -p 16686:16686 jaegertracing/all-in-one:1.14.0
docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name=elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:7.4.1
You can also try to get a trace from query REST API e.g. http://<HOST>:16686/api/traces/566ee6fdd7645562
I picked a traceId from Kibana tried a REST call on the querier and it returned 404:
URL: http://localhost:16686/api/traces/369af4970a2ad8fd
{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":404,"msg":"trace not found"}]}
I am not sure why it can not find the data in Elasticsearch. Is it something with the indices in Elasticsearch maybe?
Could you please paste here indices?
jaeger-span-write settings:
{
"settings": {
"index": {
"mapping": {
"nested_fields": {
"limit": "50"
}
},
"number_of_shards": "5",
"provided_name": "jaeger-span-write",
"creation_date": "1572280321087",
"requests": {
"cache": {
"enable": "true"
}
},
"number_of_replicas": "1",
"uuid": "Mrs1XgSqQHa4rMiIIY0-AA",
"version": {
"created": "7040099"
}
}
},
"defaults": {
"index": {
"flush_after_merge": "512mb",
"max_inner_result_window": "100",
"unassigned": {
"node_left": {
"delayed_timeout": "1m"
}
},
"max_terms_count": "65536",
"lifecycle": {
"name": "",
"rollover_alias": "",
"indexing_complete": "false"
},
"routing_partition_size": "1",
"force_memory_term_dictionary": "false",
"max_docvalue_fields_search": "100",
"merge": {
"scheduler": {
"max_thread_count": "1",
"auto_throttle": "true",
"max_merge_count": "6"
},
"policy": {
"reclaim_deletes_weight": "2.0",
"floor_segment": "2mb",
"max_merge_at_once_explicit": "30",
"max_merge_at_once": "10",
"max_merged_segment": "5gb",
"expunge_deletes_allowed": "10.0",
"segments_per_tier": "10.0",
"deletes_pct_allowed": "33.0"
}
},
"max_refresh_listeners": "1000",
"max_regex_length": "1000",
"load_fixed_bitset_filters_eagerly": "true",
"number_of_routing_shards": "1",
"write": {
"wait_for_active_shards": "1"
},
"verified_before_close": "false",
"mapping": {
"coerce": "false",
"nested_objects": {
"limit": "10000"
},
"depth": {
"limit": "20"
},
"ignore_malformed": "false",
"field_name_length": {
"limit": "9223372036854775807"
},
"total_fields": {
"limit": "1000"
}
},
"source_only": "false",
"soft_deletes": {
"enabled": "false",
"retention": {
"operations": "0"
},
"retention_lease": {
"period": "12h"
}
},
"max_script_fields": "32",
"query": {
"default_field": [
"*"
],
"parse": {
"allow_unmapped_fields": "true"
}
},
"format": "0",
"frozen": "false",
"sort": {
"missing": [],
"mode": [],
"field": [],
"order": []
},
"priority": "1",
"codec": "default",
"max_rescore_window": "10000",
"max_adjacency_matrix_filters": "100",
"analyze": {
"max_token_count": "10000"
},
"gc_deletes": "60s",
"optimize_auto_generated_id": "true",
"max_ngram_diff": "1",
"translog": {
"generation_threshold_size": "64mb",
"flush_threshold_size": "512mb",
"sync_interval": "5s",
"retention": {
"size": "512MB",
"age": "12h"
},
"durability": "REQUEST"
},
"auto_expand_replicas": "false",
"mapper": {
"dynamic": "true"
},
"data_path": "",
"highlight": {
"max_analyzed_offset": "1000000"
},
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"enable": "all",
"total_shards_per_node": "-1"
}
},
"search": {
"slowlog": {
"level": "TRACE",
"threshold": {
"fetch": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
},
"query": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
}
}
},
"idle": {
"after": "30s"
},
"throttled": "false"
},
"fielddata": {
"cache": "node"
},
"default_pipeline": "_none",
"max_slices_per_scroll": "1024",
"shard": {
"check_on_startup": "false"
},
"xpack": {
"watcher": {
"template": {
"version": ""
}
},
"version": "",
"ccr": {
"following_index": "false"
}
},
"percolator": {
"map_unmapped_fields_as_text": "false"
},
"allocation": {
"max_retries": "5"
},
"refresh_interval": "1s",
"indexing": {
"slowlog": {
"reformat": "true",
"threshold": {
"index": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
}
},
"source": "1000",
"level": "TRACE"
}
},
"compound_format": "0.1",
"blocks": {
"metadata": "false",
"read": "false",
"read_only_allow_delete": "false",
"read_only": "false",
"write": "false"
},
"max_result_window": "10000",
"store": {
"stats_refresh_interval": "10s",
"type": "",
"fs": {
"fs_lock": "native"
},
"preload": []
},
"queries": {
"cache": {
"enabled": "true"
}
},
"warmer": {
"enabled": "true"
},
"max_shingle_diff": "3",
"query_string": {
"lenient": "false"
}
}
}
}
jaeger-service-write settings
{
"settings": {
"index": {
"mapping": {
"nested_fields": {
"limit": "50"
}
},
"number_of_shards": "5",
"provided_name": "jaeger-service-write",
"creation_date": "1572280321400",
"requests": {
"cache": {
"enable": "true"
}
},
"number_of_replicas": "1",
"uuid": "_JVNNgJBT6i5vgnTpb0V9g",
"version": {
"created": "7040099"
}
}
},
"defaults": {
"index": {
"flush_after_merge": "512mb",
"max_inner_result_window": "100",
"unassigned": {
"node_left": {
"delayed_timeout": "1m"
}
},
"max_terms_count": "65536",
"lifecycle": {
"name": "",
"rollover_alias": "",
"indexing_complete": "false"
},
"routing_partition_size": "1",
"force_memory_term_dictionary": "false",
"max_docvalue_fields_search": "100",
"merge": {
"scheduler": {
"max_thread_count": "1",
"auto_throttle": "true",
"max_merge_count": "6"
},
"policy": {
"reclaim_deletes_weight": "2.0",
"floor_segment": "2mb",
"max_merge_at_once_explicit": "30",
"max_merge_at_once": "10",
"max_merged_segment": "5gb",
"expunge_deletes_allowed": "10.0",
"segments_per_tier": "10.0",
"deletes_pct_allowed": "33.0"
}
},
"max_refresh_listeners": "1000",
"max_regex_length": "1000",
"load_fixed_bitset_filters_eagerly": "true",
"number_of_routing_shards": "1",
"write": {
"wait_for_active_shards": "1"
},
"verified_before_close": "false",
"mapping": {
"coerce": "false",
"nested_objects": {
"limit": "10000"
},
"depth": {
"limit": "20"
},
"ignore_malformed": "false",
"field_name_length": {
"limit": "9223372036854775807"
},
"total_fields": {
"limit": "1000"
}
},
"source_only": "false",
"soft_deletes": {
"enabled": "false",
"retention": {
"operations": "0"
},
"retention_lease": {
"period": "12h"
}
},
"max_script_fields": "32",
"query": {
"default_field": [
"*"
],
"parse": {
"allow_unmapped_fields": "true"
}
},
"format": "0",
"frozen": "false",
"sort": {
"missing": [],
"mode": [],
"field": [],
"order": []
},
"priority": "1",
"codec": "default",
"max_rescore_window": "10000",
"max_adjacency_matrix_filters": "100",
"analyze": {
"max_token_count": "10000"
},
"gc_deletes": "60s",
"optimize_auto_generated_id": "true",
"max_ngram_diff": "1",
"translog": {
"generation_threshold_size": "64mb",
"flush_threshold_size": "512mb",
"sync_interval": "5s",
"retention": {
"size": "512MB",
"age": "12h"
},
"durability": "REQUEST"
},
"auto_expand_replicas": "false",
"mapper": {
"dynamic": "true"
},
"data_path": "",
"highlight": {
"max_analyzed_offset": "1000000"
},
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"enable": "all",
"total_shards_per_node": "-1"
}
},
"search": {
"slowlog": {
"level": "TRACE",
"threshold": {
"fetch": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
},
"query": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
}
}
},
"idle": {
"after": "30s"
},
"throttled": "false"
},
"fielddata": {
"cache": "node"
},
"default_pipeline": "_none",
"max_slices_per_scroll": "1024",
"shard": {
"check_on_startup": "false"
},
"xpack": {
"watcher": {
"template": {
"version": ""
}
},
"version": "",
"ccr": {
"following_index": "false"
}
},
"percolator": {
"map_unmapped_fields_as_text": "false"
},
"allocation": {
"max_retries": "5"
},
"refresh_interval": "1s",
"indexing": {
"slowlog": {
"reformat": "true",
"threshold": {
"index": {
"warn": "-1",
"trace": "-1",
"debug": "-1",
"info": "-1"
}
},
"source": "1000",
"level": "TRACE"
}
},
"compound_format": "0.1",
"blocks": {
"metadata": "false",
"read": "false",
"read_only_allow_delete": "false",
"read_only": "false",
"write": "false"
},
"max_result_window": "10000",
"store": {
"stats_refresh_interval": "10s",
"type": "",
"fs": {
"fs_lock": "native"
},
"preload": []
},
"queries": {
"cache": {
"enabled": "true"
}
},
"warmer": {
"enabled": "true"
},
"max_shingle_diff": "3",
"query_string": {
"lenient": "false"
}
}
}
}
Your collector is probably configured to use rollover --es.use-aliases=true
. Whereas query is looking for daily indices. You have to either use daily indices or rollover in both components.
Note that running rollover indices requires cronjob and an initialization step to use it properly.
@pavolloffay You were absolutely right, I am sorry for wasting your time. That was not very clear for me though, hopefully not too many users will run into the same issue.
You can refer to this blog post to configure it properly https://medium.com/jaegertracing/using-elasticsearch-rollover-to-manage-indices-8b3d0c77915d
Any open questions to address
I have deployed Jaeger 1.14 (Agents, Collector and Query ui as separate services) and use Elasticsearch (7.4) as backend. I ensured that Jaeger traces and spans land in Elasticsearch and I can also query them in Kibana.
Unfortunately I can not see any services/spans in the Jaeger Query ui, nor do I see any error/warn/debug log messages in the query service which would help me to figure out why I can't see these spans. Can you point me in the right direction to figure out why I don't see any spans/traces in the Query UI even though I have plenty of data in Elasticsearch?