matthyx / kong-elastic-apm

Apache License 2.0
12 stars 9 forks source link

external pluginserver 'elastic-apm' terminated #25

Open zffocussss opened 2 years ago

zffocussss commented 2 years ago

Hi mat: Recently,I found elastic-apm is not stable,it will be terminated after some hours.any ideas?

logs is below.

image
zffocussss commented 2 years ago

please see https://github.com/Kong/kong/issues/9301. does your code make panic?

zffocussss commented 2 years ago

I do not know,what caused it terminated

matthyx commented 2 years ago

Hi @zffocussss does it consistently terminate after some time, or is it due to specific traffic? Logging isn't practical through Kong plugin, maybe I should send logs to opentelemetry for the plugin :)

matthyx commented 2 years ago

I think the only fatal is the one here: https://github.com/Kong/go-pdk/blob/5775452da4c69d9fc868afb04d3487049675592a/server/pbserver.go#L201

Unfortunately I don't see an obvious way of overriding this log call to something like apmlogrus that would help us retrieving this panic inside the APM. Maybe we could log the output of the plugin to a file in /tmp and retrieve it to investigate the trace?

matthyx commented 2 years ago

Worst case we make a fork and add this instrumentation to help your debugging...

zffocussss commented 2 years ago

Hi @zffocussss does it consistently terminate after some time, or is it due to specific traffic? Logging isn't practical through Kong plugin, maybe I should send logs to opentelemetry for the plugin :)

yes it was terminated after some time,but I do not know why.

zffocussss commented 2 years ago

one hour ago,I tried kong 3.0 + this go plugin,the worst thing happened,the output request header traceparent was lost all the time.If you have time I suggest you to try it as well.

zffocussss commented 2 years ago

Hi @zffocussss does it consistently terminate after some time, or is it due to specific traffic? Logging isn't practical through Kong plugin, maybe I should send logs to opentelemetry for the plugin :)

it is running inside kong 2.8.1

zffocussss commented 2 years ago

kong 3.0 have a plugin which support opentelemetry(https://docs.konghq.com/hub/kong-inc/opentelemetry/). Can we use it directly.is it a better way for kong to connect with elastic apm server? I am a newbie in opentracing/opentelemetry/apm.please correct me if I have some errors of concepts

zffocussss commented 2 years ago

https://github.com/matthyx/kong-elastic-apm/blob/c2f0552a9709b01eb3952be9010c1946ecf5be0b/main.go#L392-L395 why do you flush and close the tracer after start server?

matthyx commented 2 years ago

kong 3.0 have a plugin which support opentelemetry(https://docs.konghq.com/hub/kong-inc/opentelemetry/). Can we use it directly.is it a better way for kong to connect with elastic apm server? I am a newbie in opentracing/opentelemetry/apm.please correct me if I have some errors of concepts

Yes we could use it directly... I don't know if they publish the source code of the plugin somewhere?

matthyx commented 2 years ago

one hour ago,I tried kong 3.0 + this go plugin,the worst thing happened,the output request header traceparent was lost all the time.If you have time I suggest you to try it as well.

I will try it, thanks for the heads up!

matthyx commented 2 years ago

https://github.com/matthyx/kong-elastic-apm/blob/c2f0552a9709b01eb3952be9010c1946ecf5be0b/main.go#L392-L395

why do you flush and close the tracer after start server?

server.StartServer is blocking... until the server shuts down. If it's a "normal" exit I try to flush events to APM, but I should probably move that to a defer.

zffocussss commented 2 years ago

kong 3.0 have a plugin which support opentelemetry(https://docs.konghq.com/hub/kong-inc/opentelemetry/). Can we use it directly.is it a better way for kong to connect with elastic apm server? I am a newbie in opentracing/opentelemetry/apm.please correct me if I have some errors of concepts

Yes we could use it directly... I don't know if they publish the source code of the plugin somewhere?

source code is put in https://github.com/Kong/kong/blob/master/kong/plugins/opentelemetry/handler.lua

matthyx commented 2 years ago

one hour ago,I tried kong 3.0 + this go plugin,the worst thing happened,the output request header traceparent was lost all the time.If you have time I suggest you to try it as well.

Can you describe your use case? I have tried forcing kong image to version 3.0 and all seems to work... I will try to configure the lua plugin to allow comparing outputs between both plugins.

zffocussss commented 2 years ago

my use case is sending trace info to elastic apm server in order to implement distributed tracing in k8s cluster.

internet -> L4 loadBalancer -> kong app gateway -> application service.

Maybe it is caused by my k8s cluster.

matthyx commented 2 years ago

@zffocussss btw I still plan to come back to you... I need a bit of time to bump versions in docker-compose.yml

Tai-ch0802 commented 2 years ago

Hi, all ✋

I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. https://github.com/Tai-ch0802/docker-elk-for-kong/commit/318af9695e7cb77b242ababebe8129ac04d55e55

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5)

** btw apm-server not support otel in 7.6


ref:

Dogrtt commented 1 year ago

Hi guys, I'm facing the same problem. Kong: 2.8.1 (Declarative config) APM: 7.15.2 Situation 1: I'm specifying url of my service in a kong.yml file as http://host.docker.internal:8001 Starting my FastAPI service in PyCharm on host system. Result: Plugin works perfectly. Situation 2: I'm specifying url of my service in a kong.yml file as http://my_server:80 Adding dockerized FastAPI service to the compose file. Result: Plugin handles first request, then crashes with the same error as topic starter showed. Also, I noticed, that even traceparent header appears in a proxied request, I don't see elastic-apm in the kibana...

UPD. I tried to test it with kong 3.0.1 and ELK 7.17.7. Output is a little bit different, probably it could help you with debugging:

2022/12/04 14:36:34 [warn] 1119#0: 599 [kong] pb_rpc.lua:394 [elastic-apm] closed, context: ngx.timer, client: 172.19.0.1, server: 0.0.0.0:8000 2022/12/04 14:36:34 [notice] 1116#0: signal 17 (SIGCHLD) received from 1120 2022/12/04 14:36:34 [error] 1119#0: 599 connect() to unix:/usr/local/kong/elastic-apm.socket failed (111: Connection refused), context: ngx.timer, client: 172.19.0.1, server: 0.0.0.0:8000 2022/12/04 14:36:34 [notice] 1116#0: 317 [kong] process.lua:232 external pluginserver 'elastic-apm' terminated: exit 2, context: ngx.timer 2022/12/04 14:36:34 [error] 1119#0: 599 lua entry thread aborted: runtime error: ...cal/share/lua/5.1/kong/runloop/plugin_servers/pb_rpc.lua:301: connection refused stack traceback: coroutine 0: [C]: in function 'assert' ...cal/share/lua/5.1/kong/runloop/plugin_servers/pb_rpc.lua:301: in function 'call' ...cal/share/lua/5.1/kong/runloop/plugin_servers/pb_rpc.lua:358: in function 'call_start_instance' ...local/share/lua/5.1/kong/runloop/plugin_servers/init.lua:185: in function 'get_instance_id' ...cal/share/lua/5.1/kong/runloop/plugin_servers/pb_rpc.lua:385: in function 'handle_event' ...local/share/lua/5.1/kong/runloop/plugin_servers/init.lua:252: in function <...local/share/lua/5.1/kong/runloop/plugin_servers/init.lua:245>, context: ngx.timer, client: 172.19.0.1, server: 0.0.0.0:8000 2022/12/04 14:36:34 [notice] 1116#0: *317 [kong] process.lua:216 Starting elastic-apm, context: ngx.timer

matthyx commented 1 year ago

@Dogrtt and @zffocussss sorry I've been neglecting this issue... @Dogrtt can you share the reproducer in a docker-compose file? Let me see if I can reproduce locally and propose a patch, thanks for your patience!

Dogrtt commented 1 year ago

@Dogrtt and @zffocussss sorry I've been neglecting this issue... @Dogrtt can you share the reproducer in a docker-compose file? Let me see if I can reproduce locally and propose a patch, thanks for your patience!

Hi @matthyx , first of all, thank you for your work. That's link to my repo in which I tied to reproduce the issue, if you will experience some troubles with starting, please, write it here. https://github.com/Dogrtt/kong_elastic_apm_test

zffocussss commented 1 year ago

Hi, all ✋

I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. Tai-ch0802/docker-elk-for-kong@318af96

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5)

** btw apm-server not support otel in 7.6

ref:

really? is it stable?

Dogrtt commented 1 year ago

@zffocussss APM 8+ requires security feature enabled, so, you can't just put your ELK behind Nginx's basic auth, so, I tried it with ELK+APM 7.17.7. In APM's docs related to the OTel, you can find info that APM supports OTLP requests since 7.13 out of the box without any collector service. I tried Kong 3.0.1's opentelemetry plugin with direct pushes to the http://apm_server:8200. My proxied services started receiving "traceparent" headers, but there was no any kong-gateway entrypoint in APM. Traces wasn't available, only agents reports from Python and C# services. Then, I added collector service and it start working. Only one my concern is that traces doesn't recognize URL templates - instead of single trace for http://my_service:80/api/files/{file_id}, it shows all of them with a real id - http://my_service:80/api/files/123, http://my_service:80/api/files/5532, etc.

zffocussss commented 1 year ago

so bad image does it have an impact on tracing data?

matthyx commented 1 year ago

@Dogrtt @zffocussss thanks for your patience, I have found the issue: https://github.com/matthyx/kong-elastic-apm/commit/8e28e66d84b15d064e754023458f174cdd117836

Can you try again with the latest code?

zffocussss commented 1 year ago

ok let me have a try

zffocussss commented 1 year ago

Hi, all ✋

I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. Tai-ch0802/docker-elk-for-kong@318af96

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5)

** btw apm-server not support otel in 7.6

ref:

why do you use otel collector? you can send metrics to apm server directly

zffocussss commented 1 year ago

Hi, all ✋ I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. Tai-ch0802/docker-elk-for-kong@318af96

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5) ** btw apm-server not support otel in 7.6 ref:

why do you use otel collector? you can send metrics to apm server directly

hello,In fact,you do not need load elastic-apm any more,as you make the opentelemtry plugin work already.

Tai-ch0802 commented 1 year ago

Hi, all ✋ I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. Tai-ch0802/docker-elk-for-kong@318af96

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5) ** btw apm-server not support otel in 7.6 ref:

why do you use otel collector? you can send metrics to apm server directly

hello,In fact,you do not need load elastic-apm any more,as you make the opentelemtry plugin work already.

Cool! How to do that?

I just follow the doc and set otel-collector-config.yml like this.

receivers:
  otlp:
    protocols:
      # grpc:
      http:

processors:
  batch:

exporters:
  logging:
    loglevel: debug
  otlp/elastic:
    endpoint: {your_elastic_apm_endpoint}
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]
    metrics:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]
    logs: 
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

why do you use otel collector? you can send metrics to apm server directly

And I consider elastic-apm must be if I need trace logs.

zffocussss commented 1 year ago

Hi, all ✋ I tried to use otel plugin (https://docs.konghq.com/hub/kong-inc/opentelemetry/) base on elk stack 8.5, and get success.

image

here is sample commit. maybe can help. Tai-ch0802/docker-elk-for-kong@318af96

docker-compose up --build

solution: kong gateway -> otel plugin -> otel collector -> apm server -> elk stack(8.5) ** btw apm-server not support otel in 7.6 ref:

why do you use otel collector? you can send metrics to apm server directly

hello,In fact,you do not need load elastic-apm any more,as you make the opentelemtry plugin work already.

Cool! How to do that?

I just follow the doc and set otel-collector-config.yml like this.

receivers:
  otlp:
    protocols:
      # grpc:
      http:

processors:
  batch:

exporters:
  logging:
    loglevel: debug
  otlp/elastic:
    endpoint: {your_elastic_apm_endpoint}
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]
    metrics:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]
    logs: 
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

why do you use otel collector? you can send metrics to apm server directly

And I consider elastic-apm must be if I need trace logs.

you do not need to use a collector to forward your traces data as APM can work with otle plugin just set endpoint to apm server address (https://{apm}/v1/traces), add Authorization header if authentication is required

michbeck100 commented 1 year ago

Would you share your otel plugin config? I can't get it to work with the endpoint https://{apm}/v1/traces.

Heres whats in the Kong logs: [error] 2303#0: *1673 [lua] handler.lua:102: process(): [otel] response error: 404, body: {"error":"404 page not found"}

jsoule6 commented 1 year ago

otel

I think I have the plugin setup correctly, but I don't see any traces being sent to APM. My plugin config is below.

apiVersion: configuration.konghq.com/v1 config: endpoint: http://apm-server-apm-http.elasticsearch.svc.cluster.local:8200/v1/traces headers: Authorization: Bearer Secret Token kind: KongClusterPlugin metadata: annotations: kubernetes.io/ingress.class: kong labels: global: 'true' name: kong-global-opentelemetry plugin: opentelemetry

I can see the plugin as registered in the Kong console and that looks good. I see some logs like the below indicating that Spans are being traced within Kong. Unfortunately, I see nothing being sent to APM. Any ideas:

2023/05/18 19:15:27 [debug] 2311#0: *265149 [lua] handler.lua:162: [otel] total spans in current request: 1 2023/05/18 19:15:27 [debug] 2311#0: *265149 [lua] instrumentation.lua:332: runloop_log_after(): [tracing] collected 1 spans: Span #1 name=root attributes={"http.status_code":200}

jsoule6 commented 1 year ago

otel

I think I have the plugin setup correctly, but I don't see any traces being sent to APM. My plugin config is below.

apiVersion: configuration.konghq.com/v1 config: endpoint: http://apm-server-apm-http.elasticsearch.svc.cluster.local:8200/v1/traces headers: Authorization: Bearer Secret Token kind: KongClusterPlugin metadata: annotations: kubernetes.io/ingress.class: kong labels: global: 'true' name: kong-global-opentelemetry plugin: opentelemetry

I can see the plugin as registered in the Kong console and that looks good. I see some logs like the below indicating that Spans are being traced within Kong. Unfortunately, I see nothing being sent to APM. Any ideas:

2023/05/18 19:15:27 [debug] 2311#0: *265149 [lua] handler.lua:162: [otel] total spans in current request: 1 2023/05/18 19:15:27 [debug] 2311#0: *265149 [lua] instrumentation.lua:332: runloop_log_after(): [tracing] collected 1 spans: Span #1 name=root attributes={"http.status_code":200}

I actually got this to work. The URL should not have /v1/traces at the end for APM. That being said, I see the traces being accepted by APM, but do not see them in Kibana. We just java agents as well and I do see those services in APM, but nothing for Kong.