thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.03k stars 2.09k forks source link

traceing use elastic apm #4386

Open MrYueQ opened 3 years ago

MrYueQ commented 3 years ago

Thanos, Prometheus and Golang version used:

thanos, version 0.21.0 (branch: HEAD, revision: d19aea21b178a6c08838c6a05b834dd44ef14273)
  build user:       root@30b80273f41d
  build date:       20210603-13:49:01
  go version:       go1.16.4
  platform:         linux/amd64

prometheus, version 2.25.0 (branch: HEAD, revision: a6be548dbc17780d562a39c0e4bd0bd4c00ad6e2)
  build user:       root@615f028225c9
  build date:       20210217-14:17:24
  go version:       go1.15.8
  platform:         linux/amd64

apm-server version
apm-server version 7.8.1 (amd64), libbeat 7.8.1 [f1a08a22e814d19ee01d5d714430a851552d4937 built 2020-07-21 14:00:32 +0000 UTC]

Object Storage Provider:

COS

What happened:

thanos turns on the traceid function. Report indicators to elastic

elastic-apm-config

 trace.elastic.apm.yml
type: ELASTIC_APM
config:
  service_name: "thanos"
  service_version: "v1.0.0"
  service_environment: "maxwell"
  sample_rate: 0

systemctl

Environment="ELASTIC_APM_SERVER_URL=http://apm-server:8200"
Environment="ELASTIC_APM_LOG_LEVEL=debug"
Environment="ELASTIC_APM_LOG_FILE=stdout"

thanos query --store.response-timeout=2m \
                  --query.auto-downsampling  \
                  --query.partial-response  \
                  --query.max-concurrent=200 \
                  --query.max-concurrent-select=40 \
                  --query.metadata.default-time-range=30s \
                  --log.format=json \
                  --log.level=warn \
                  --store.unhealthy-timeout=3m  \
                  --store.sd-interval=3m  \
                  --store.sd-files=/data/app/thanos/conf.d/store.yml \
                  --request.logging-config-file=/data/app/thanos/conf.d/logger.yml \
                  --tracing.config-file=/data/app/thanos/conf.d/trace.elastic.apm.yml

image image

Full logs to relevant components:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142274f]
goroutine 713 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc00125a630)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc00125a630, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc00125a630)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc000372e00, 0x1fc3220, 0xc000076120, 0xe9591)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc000373140, 0x1c46d40, 0xc000cd41e0, 0x10, 0x1a936c0)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00000db90, 0x1c46d40, 0xc000cd41e0, 0x2, 0x2)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc000a42810, 0xc0014caf70, 0x0, 0x1)
/app/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc00125a7e0, 0xc0009d60c0, 0xc0009d6060)
/app/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
/app/pkg/store/proxy.go:392 +0x1af
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142274f]
goroutine 837 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000e90fc0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000e90fc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000e90fc0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc0004fd8c0, 0x1fc3220, 0xc000076120, 0x10be2a)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc0004fd980, 0x1c46d40, 0xc000e7c4d0, 0x10, 0x1a936c0)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00000d320, 0x1c46d40, 0xc000e7c4d0, 0x2, 0x2)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc000a42360, 0xc001185f70, 0x0, 0x1)
/app/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000e91170, 0xc00027c660, 0xc00027c600)
/app/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
/app/pkg/store/proxy.go:392 +0x1af
MrYueQ commented 3 years ago

image

MrYueQ commented 3 years ago

Help me.

BoringCat commented 3 years ago

Same panic in thanos@0.22.0

Thanos, Golang version used:

thanos, version 0.22.0 (branch: HEAD, revision: 3656e290c8eb034e2e09e0afc79f087be88de481)
  build user:       circleci@d99a55512da0
  build date:       20210721-15:34:46
  go version:       go1.16.6
  platform:         linux/amd64

panic log:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x143818f]

goroutine 559 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000677950)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000677950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000677950)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc00056ab00, 0x1fed600, 0xc00009a110, 0x2a9ac6)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc00056adc0, 0x1c6b420, 0xc00040a1a0, 0x10, 0x1ab4920)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00098a828, 0x1c6b420, 0xc00040a1a0, 0x1b08220, 0xc00009a101)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc0002092a0, 0xc000a75f70, 0x0, 0x1)
    /home/circleci/project/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000677b00, 0xc000675380, 0xc000675320)
    /home/circleci/project/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
    /home/circleci/project/pkg/store/proxy.go:392 +0x1af

Full logs to relevant components:

Full logs with --log.level=debug
{"caller":"main.go:64","level":"debug","msg":"maxprocs: Updating GOMAXPROCS=[2]: determined from CPU quota","ts":"2021-08-17T09:31:10.272249221Z"}
{"caller":"factory.go:38","level":"info","msg":"loading tracing configuration","ts":"2021-08-17T09:31:10.272428136Z"}
{"caller":"engine.go:292","level":"debug","msg":"Lookback delta is zero, setting to default value","ts":"2021-08-17T09:31:10.27636971Z","value":"5m0s"}
{"caller":"options.go:24","level":"info","msg":"disabled TLS, key and cert must be set to enable","protocol":"gRPC","ts":"2021-08-17T09:31:10.277965106Z"}
{"caller":"query.go:640","level":"info","msg":"starting query node","ts":"2021-08-17T09:31:10.278338688Z"}
{"caller":"intrumentation.go:60","level":"info","msg":"changing probe status","status":"healthy","ts":"2021-08-17T09:31:10.278503006Z"}
{"address":"0.0.0.0:9090","caller":"http.go:63","component":"query","level":"info","msg":"listening for requests and metrics","service":"http/server","ts":"2021-08-17T09:31:10.278536659Z"}
{"caller":"tls_config.go:191","component":"query","http2":false,"level":"info","msg":"TLS is disabled.","service":"http/server","ts":"2021-08-17T09:31:10.278649952Z"}
{"caller":"intrumentation.go:48","level":"info","msg":"changing probe status","status":"ready","ts":"2021-08-17T09:31:10.278713161Z"}
{"address":"0.0.0.0:10901","caller":"grpc.go:123","component":"query","level":"info","msg":"listening for serving gRPC","service":"gRPC/server","ts":"2021-08-17T09:31:10.278743287Z"}
{"cachedStores":0,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:10.278776269Z"}
{"activeStores":0,"cachedStores":0,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:10.278829168Z"}
{"cachedStores":0,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:15.279829519Z"}
{"activeStores":9,"cachedStores":0,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:15.325354902Z"}
{"address":"172.16.0.24:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325406739Z"}
{"address":"172.16.0.250:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.32542843Z"}
{"address":"172.16.0.113:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325471391Z"}
{"address":"172.16.1.7:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325507488Z"}
{"address":"172.16.1.3:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325530953Z"}
{"address":"172.16.0.110:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325552383Z"}
{"address":"172.16.0.184:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325574765Z"}
{"address":"172.16.0.32:10901","caller":"storeset.go:463","component":"storeset","extLset":"{rule_replica=\"thanos-rule-0\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325597248Z"}
{"address":"172.16.0.116:10901","caller":"storeset.go:463","component":"storeset","extLset":"{rule_replica=\"thanos-rule-1\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325620601Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:20.282361055Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:20.284025571Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:25.282178565Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:25.283644257Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:30.282237366Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:30.284005367Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:35.279837842Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:35.281350331Z"}
{"caller":"proxy.go:659","level":"debug","msg":"Store Addr: 172.16.0.110:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.250:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.113:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 queried;Store Addr: 172.16.1.7:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 queried;Store Addr: 172.16.1.3:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.116:10901 LabelSets: {rule_replica=\"thanos-rule-1\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out due to does not have data within this time period: [1629149483143,1629192683143]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.0.24:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.184:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.32:10901 LabelSets: {rule_replica=\"thanos-rule-0\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out due to does not have data within this time period: [1629149483143,1629192683143]. Store time ranges: [9223372036854775807,9223372036854775807]","ts":"2021-08-17T09:31:38.854410793Z"}
{"caller":"proxy.go:274","component":"proxy","level":"debug","msg":"Series: started fanout streams","request":"min_time:1629192392224 max_time:1629192692224 matchers: aggregates:COUNT aggregates:SUM partial_response_disabled:true ","status":"Store Addr: 172.16.0.24:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.184:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.32:10901 LabelSets: {rule_replica=\"thanos-rule-0\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.1.3:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.116:10901 LabelSets: {rule_replica=\"thanos-rule-1\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.0.110:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.250:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.113:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [1624439036465,1629188507217];store Addr: 172.16.1.7:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [1624439036465,1629188507217]","ts":"2021-08-17T09:31:38.924669277Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x143818f]

goroutine 559 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000677950)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000677950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000677950)
    /home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc00056ab00, 0x1fed600, 0xc00009a110, 0x2a9ac6)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc00056adc0, 0x1c6b420, 0xc00040a1a0, 0x10, 0x1ab4920)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00098a828, 0x1c6b420, 0xc00040a1a0, 0x1b08220, 0xc00009a101)
    /home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc0002092a0, 0xc000a75f70, 0x0, 0x1)
    /home/circleci/project/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000677b00, 0xc000675380, 0xc000675320)
    /home/circleci/project/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
    /home/circleci/project/pkg/store/proxy.go:392 +0x1af
stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

omron93 commented 2 years ago

I'm getting the same issue even after upgrading to 0.23.1

matej-g commented 2 years ago

Looks like this started to appear after https://github.com/thanos-io/thanos/pull/4619 but hasn't been addressed yet. I'm wondering what is the exact issue, to me it looks like something might be up with APM's implementation. This would need further investigation but might be worth opening issue upstream as well.

dbasden commented 2 years ago

The problem is happening at pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 with the s.span.Context.SetLabel call.

Everything is fine with spans from the inbound HTTP UI requests, where s.span.Context is set (likely directly in thanos/pkg/tracing/http.go)

When an intercepted GRPC span is handled by the above APM span code however (e.g. when Thanos query is connecting to a store API over GRPC to service a query) s.span.Context is nil, and the null pointer deference for SetLabel causes the SIGSEGV.

Looking at thanos/pkg/tracing/tracing/client/factory.go, it looks like elastic APM is the only library that is not passed a context when setting up a new tracer. I'm guessing that could be why the problem is only happening with APM, and only with traces using the grpc interception middleware

That's as far as I got with a bunch of time and dlv. I'm not that experienced with golang or the ecosystem so can't fix it myself but hopefully that should give a lot more direction (assuming I'm right about the root cause)

matej-g commented 2 years ago

Thanks for the investigation @dbasden! I had a bit of closer look as well, to me it seems that for some reason, in the APM implementation, the SpanData pointer is nil at the point where the program panics. SpanData has field Context, which we try to get in setSpanContext() through embedding in s.span.

According to the docs, the SpanData should be only nil if the span has been ended. This could mean that we are trying to call setSpanContext() on an already ended span, which would explain it being nil. But I wasn't able to confirm whether this is really the case yet.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

omron93 commented 2 years ago

This is still needed and would be great to have it fixed :-)

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

omron93 commented 2 years ago

This is still needed and would be great to have it fixed :-)

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

omron93 commented 2 years ago

This is still needed and would be great to have it fixed :-)

pjaak commented 2 years ago

Can confirm still happening and still would be great to have fixed!

matej-g commented 2 years ago

Yes, this unfortunately this still needs more investigation, it would be great if someone who's affected had the time and resources to look at this :disappointed:

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bck01215 commented 1 year ago

Same error 0.28