krakend / krakend-otel

KrakenD component for OpenTelemetry
Apache License 2.0
8 stars 10 forks source link

Telemetry exporter has error in logs, traces still being captured for a route in skip_paths #26

Closed samstride closed 1 month ago

samstride commented 5 months ago

Environment info:

---- 1 ---- We have our observability config as follows:

"telemetry/opentelemetry": {
    "exporters": {
        "prometheus": [
            {
                "name": "gateway_prometheus",
                "port": 5092,
                "process_metrics": false,
                "go_metrics": false
            }
        ],
        "otlp": [
            {
                "name": "gateway_trace",
                "host": "jaeger.domain",
                "port": 4317,
                "use_http": false
            }
        ]
    },
    "layers": {
        "global": {
            "disable_metrics": false,
            "disable_traces": false,
            "disable_propagation": false
        },
        "proxy": {
            "disable_metrics": false,
            "disable_traces": false
        },
        "backend": {
            "metrics": {
                "disable_stage": false,
                "round_trip": true,
                "read_payload": true,
                "detailed_connection": true,
                "static_attributes": [
                    {
                        "key": "my_metric_attr",
                        "value": "my_middle_metric"
                    }
                ]
            },
            "traces": {
                "disable_stage": false,
                "round_trip": true,
                "read_payload": true,
                "detailed_connection": true,
                "static_attributes": [
                    {
                        "key": "my_metric_attr",
                        "value": "my_middle_metric"
                    }
                ]
            }
        }
    },
    "skip_paths": [
        "/"
    ]
}

Metrics and traces seem to be collected but I see this in the logs every 30 seconds (scrape interval):

ERROR [SERVICE OpenTelemetry] failed to upload metrics: rpc error: code = Unimplemented desc = unknown service opentelemetry.proto.collector.metrics.v1.MetricsService

---- 2 ---- We use "/" as our health path and have added that to skip_paths. However, I can still see traces being collected for that path.

---- 3 ---- We have our circuit breaker config as follows:

"qos/circuit-breaker": {
    "interval": 60,
    "timeout": 10,
    "max_errors": 10,
    "name": "krakend-circuitbreaker",
    "log_status_change": true
}

However, we can't seem to see any logs when the circuit status changes. Not sure if the circuit breaker is actually working?

Are the above behaviours expected?

Thanks.

alombarte commented 3 months ago

Hi @samstride,

This is a buy 1 get 2 free offer! :joy: I will focus on the non-otel question, and my colleagues will jump in for the other 2.

To make sure that the circuit breaker is working, first of all, check --lint the configuration to make sure it is in the right place and with the right attributes. If it is OK, my thinking is that the values are hard to get. The "max_errors": 10 means the same backend must see 10 consecutive errors. One error after another. This is a pretty high value, and if a single request works while trying to count to the consecutive 10, nothing will happen.

samstride commented 3 months ago

@alombarte ,

krakend check -c krakend.json --lint
Parsing configuration file: krakend.json
Syntax OK!

So linter is passing.

The way we tested the circuit breaker:

alombarte commented 3 months ago

When the server starts, you should see a log line like this:

KRAKEND DEBUG: [BACKEND: /something][CB] Creating the circuit breaker named 'krakend-circuitbreaker'

Could you verify this? If the log is not there (make sure your logging level is set to ' DEBUG`), the CB it has not been loaded in the backend section. If the line is there, then the problem is probably in what is tested vs what is declared in the configuration.

samstride commented 3 months ago

@alombarte , thanks, have set log level to DEBUG, can see the log

DEBUG [BACKEND: ....][CB] Creating the circuit breaker named 'krakend-circuitbreaker'

Can't see any logs when state changes.

Pasting sample config here for reference:

{
    "endpoint": "/something",
    "method": "POST",
    "output_encoding": "no-op",
    "extra_config": {
        "qos/ratelimit/router": {
            "max_rate": 500,
            "capacity": 500,
            "client_max_rate": 50,
            "client_capacity": 50,
            "every": "1s",
            "strategy": "ip"
        }
    },
    "backend": [
        {
            "encoding": "no-op",
            "host": [
                "something:5050"
            ],
            "url_pattern": "/",
            "extra_config": {
                "qos/circuit-breaker": {
                    "interval": 60,
                    "timeout": 10,
                    "max_errors": 3,
                    "name": "krakend-circuitbreaker",
                    "log_status_change": true
                },
                "qos/ratelimit/proxy": {
                    "max_rate": 500,
                    "capacity": 500
                }
            }
        }
    ]
}
alombarte commented 3 months ago

If you see the initialization line, then the CB is watching problems. I've copied and pasted your CB section into a configuration, and it works as expected:

krakend_1  | [00] 2024/05/21 07:10:38 KRAKEND INFO: [SERVICE: Gin] Listening on port: 8080
krakend_1  | [00] [GIN] 2024/05/21 - 07:10:40 | 500 |    1.695672ms |    192.168.16.1 | GET      "/test"
krakend_1  | [00] [GIN] 2024/05/21 - 07:10:41 | 500 |    1.566328ms |    192.168.16.1 | GET      "/test"
krakend_1  | [00] [GIN] 2024/05/21 - 07:10:42 | 500 |    1.144562ms |    192.168.16.1 | GET      "/test"
krakend_1  | [00] 2024/05/21 07:10:42 KRAKEND WARNING: [CB] Circuit breaker named 'krakend-circuitbreaker' went from 'closed' to 'open'
krakend_1  | [00] [GIN] 2024/05/21 - 07:10:42 | 500 |    1.262023ms |    192.168.16.1 | GET      "/test"
krakend_1  | [00] 2024/05/21 07:10:42 KRAKEND ERROR: [ENDPOINT: /test] circuit breaker is open
krakend_1  | [00] Error #01: circuit breaker is open
krakend_1  | [00] 2024/05/21 07:10:42 KRAKEND ERROR: [ENDPOINT: /test] circuit breaker is open
krakend_1  | [00] Error #01: circuit breaker is open
krakend_1  | [00] 2024/05/21 07:10:45 KRAKEND ERROR: [ENDPOINT: /test] circuit breaker is open
krakend_1  | [00] [GIN] 2024/05/21 - 07:10:45 | 500 |      78.108µs |    192.168.16.1 | GET      "/test"
krakend_1  | [00] Error #01: circuit breaker is open
krakend_1  | [00] 2024/05/21 07:11:32 KRAKEND WARNING: [CB] Circuit breaker named 'krakend-circuitbreaker' went from 'open' to 'half-open'
krakend_1  | [00] 2024/05/21 07:11:32 KRAKEND WARNING: [CB] Circuit breaker named 'krakend-circuitbreaker' went from 'half-open' to 'open'
krakend_1  | [00] [GIN] 2024/05/21 - 07:11:32 | 500 |    1.485633ms |    192.168.16.1 | GET      "/test"
krakend_1  | [00] 2024/05/21 07:11:34 KRAKEND ERROR: [ENDPOINT: /test] circuit breaker is open
krakend_1  | [00] [GIN] 2024/05/21 - 07:11:34 | 500 |      81.804µs |    192.168.16.1 | GET      "/test"
krakend_1  | [00] Error #01: circuit breaker is open
krakend_1  | [00] 2024/05/21 07:11:34 KRAKEND ERROR: [ENDPOINT: /test] circuit breaker is open

Please open an issue in the Circuit breaker repository if you have a reproducible example. And let's leave this issue for the OTEL part.

samstride commented 1 month ago

Closing this issues since these have been fixed in v2.6.3. Traces are no longer collected for routes specified in skip_paths.