krakend / krakend-ce

KrakenD Community Edition: High-performance, stateless, declarative, API Gateway written in Go.
https://www.krakend.io
Apache License 2.0
1.94k stars 453 forks source link

Gateway crashes handling a request with opentelemetry enabled #858

Closed tsemczyszyn closed 6 months ago

tsemczyszyn commented 6 months ago

Environment info:

Describe the bug I have a working configuration of krakend running in a testing environment I'm experimenting with. I wanted to try out the opentelemetry stuff added in 2.6 but when I add in the opentelemetry configuration the gateway crashes when I hit a specific endpoint. If I remove the opentelemetry config and repeat the same request everything works.

Your configuration file:

Here's the opentelemetry section I added:

  "extra_config": {
    "telemetry/opentelemetry": {
      "service_name": "krakend",
      "trace_sample_rate": 1,
      "metric_reporting_period": 1,
      "exporters": {
        "otlp": [
          { 
            "name": "jaeger",
            "host": "jaeger",
            "port": 4317,
            "use_http": false,
            "disable_metrics": true
          }
        ]
      }
    },
}

Commands used How did you start the software?

FROM devopsfaith/krakend:2.6.0 as builder
COPY . .
RUN FC_ENABLE=1 \
    FC_TEMPLATES="templates" \
    FC_SETTINGS="settings/dev/" \
    FC_OUT=/tmp/krakend.json \
    krakend check -d -t -c krakend.tmpl

RUN krakend check -c /tmp/krakend.json --lint

FROM devopsfaith/krakend:2.6.0
COPY --from=builder --chown=krakend:nogroup /tmp/krakend.json .
COPY scripts /

Expected behavior The request should be routed normally without the gateway crashing

Logs

gateway-1  | [GIN] 2024/03/11 - 19:41:58 | 200 |    1.867203ms |   172.18.206.15 | GET      "/example/endpoint"
gateway-1  | panic: runtime error: invalid memory address or nil pointer dereference
gateway-1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x14c0cea]
gateway-1  |
gateway-1  | goroutine 48 [running]:
gateway-1  | github.com/krakend/krakend-otel/io.(*ioTracking).end(0xc000b70710, {0x0?, 0x0})
gateway-1  |    /go/pkg/mod/github.com/krakend/krakend-otel@v0.2.0/io/tracking.go:64 +0x1aa
gateway-1  | github.com/krakend/krakend-otel/io.(*instrumentedReader).Close(0xc000b70700)
gateway-1  |    /go/pkg/mod/github.com/krakend/krakend-otel@v0.2.0/io/reader.go:71 +0x50
gateway-1  | github.com/luraproject/lura/v2/proxy.readCloserWrapper.closeOnCancel({{0x3b525e0?, 0xc000a251a0?}, {0x3b49c80?, 0xc000b70700?}})
gateway-1  |    /go/pkg/mod/github.com/luraproject/lura/v2@v2.6.1/proxy/proxy.go:54 +0x55
gateway-1  | created by github.com/luraproject/lura/v2/proxy.NewReadCloserWrapper
gateway-1  |    /go/pkg/mod/github.com/luraproject/lura/v2@v2.6.1/proxy/proxy.go:43 +0xbf

Additional context The endpoint in question uses modifier/lua-backend to call some scripts, but so do many other endpoints I'm using and those don't crash. I don't want to reveal my complete configuration out of concern there may be sensitive information. Please let me know what other information I could supply to help.

alombarte commented 6 months ago

Hi @tsemczyszyn, Thanks for testing the feature the same day we release it! I love it!

It is clear that adding the new module is causing the crash you are having, and we need to figure out what is causing this behavior. Jaeger was one of the components we used during testing with no problems, and I am interested in knowing this particular scenario.

Aside from Lua, are there any other custom plugins in your system? Can you report a minimal configuration example that leads to this crash without revealing anything sensitive? You can for instance use the echo or the debug endpoints to not reveal your backends. Something minimal that we can use to have a look, because otherwise this might be pretty hard to find.

Here's an example of a (working) minimal configuration:

{
  "version": 3,
  "$schema": "https://www.krakend.io/schema/krakend.json",
  "host": [
    "http://localhost:8080"
  ],
  "debug_endpoint": true,
  "echo_endpoint": true,
  "endpoints": [
    {
      "endpoint": "/test",
      "backend": [
        {
          "url_pattern": "/__debug/test"
        }
      ]
    }
  ],
  "extra_config": {
    "telemetry/opentelemetry": {
      "service_name": "krakend",
      "trace_sample_rate": 1,
      "metric_reporting_period": 1,
      "exporters": {
        "otlp": [
          {
            "name": "jaeger",
            "host": "jaeger",
            "port": 4317,
            "use_http": false,
            "disable_metrics": true
          }
        ]
      }
    }
  }
}

Testable with a docker compose:

version: "3"
services:
  jaeger:
    image: jaegertracing/all-in-one:1.54
    environment:
      COLLECTOR_ZIPKIN_HOST_PORT: ":9411"
    ports:
      - "55778:5778"   # serve configs
      - "56686:16686"  # serve frontend UI
      - "55317:4317"   # otlp grpc: we remap this to be able to run other envs
      - "55318:4318"   # otlp http: we reamp this to be able to run other envs
  krakend:
    image: devopsfaith/krakend:watch
    volumes:
      - "./:/etc/krakend"
    ports:
      - "8080:8080"
    command: ["run", "-c", "krakend.json"]

Do you think that you can provide something similar that reproduces the crash?

Thank you very much for the report

tsemczyszyn commented 6 months ago

It seems that in my configuration I had "encoding" set to "no-op" on the backend, but "output_encoding" set to "json". When I changed this to "no-op" on both the crash went away. I replicated the crash using this configuration:

{
  "version": 3,
  "$schema": "https://www.krakend.io/schema/krakend.json",
  "host": [
    "http://localhost:8080"
  ],
  "debug_endpoint": true,
  "echo_endpoint": true,
  "output_encoding": "json",
  "endpoints": [
    {
      "endpoint": "/test",
      "backend": [
        {
          "encoding": "no-op",
          "url_pattern": "/__debug/test"
        }
      ]
    }
  ],
  "extra_config": {
    "telemetry/opentelemetry": {
      "service_name": "krakend",
      "trace_sample_rate": 1,
      "metric_reporting_period": 1,
      "exporters": {
        "otlp": [
          {
            "name": "jaeger",
            "host": "jaeger",
            "port": 4317,
            "use_http": false,
            "disable_metrics": true
          }
        ]
      }
    }
  }
}
kpacha commented 6 months ago

Thanks for the follow-up. It will help with the development of the feature.

The config is automatically validated when KrakenD starts and if there is an endpoint with "output_encoding": "no-op" then it checks that the endpoint have a single backend and that backend is using "encoding":"no-op" (if not, it changes it automatically). But we avoid doing the same thing for endpoints using other output_encoding because it would limit the types of services KrakenD can consume, for example: a service returning data using MessagePack.