open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.52k stars 1.32k forks source link

W3C tracing does not work as expected #6905

Open Erates opened 1 month ago

Erates commented 1 month ago

Short description

According to the documentation, OPA supports W3C tracing. Thus, we send the traceparent header to OPA when performing a REST request that evaluates policies. But we can not find a single place where this trace information is outputted. The documentation regarding decision logs mentions that the log should output the trace_id and span_id. But the trace info is not shown when outputting the decision logs to the console, configured using decision_logs.console=true.

OPA version used: 0.67.0

Steps To Reproduce

  1. Create a file policy/example.rego with the content
    
    package example

import rego.v1

result := input.message

2. Create a `docker-compose.yml` file with the content
```yaml
services:
  opa:
    image: openpolicyagent/opa:0.67.0
    ports:
      - "8181:8181"
    volumes:
      - ./policy:/policy
    command:
      - "run"
      - "--server"
      - "--log-level=debug"
      - "--log-format=json"
      - "--set"
      - "decision_logs.console=true"
      - "/policy"
  1. Run the docker-compose
  2. Perform an http request to evaluate the policy
    
    POST http://localhost:8181/v1/data/example/result
    Content-Type: application/json
    traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

{ "input": { "message": "Hello world!" } }

5. Verify the output log message, it contains something like this
```json
{
  "decision_id": "c82e0a5a-0f5b-46bd-9b34-2c58673e9a0d",
  "input": {
    "message": "Hello world!"
  },
  "labels": {
    "id": "003e7aaa-d132-4980-b767-9f8f213bd478",
    "version": "0.67.0"
  },
  "level": "info",
  "metrics": {
    "counter_server_query_cache_hit": 0,
    "timer_rego_input_parse_ns": 141927,
    "timer_rego_query_compile_ns": 105805,
    "timer_rego_query_eval_ns": 165426,
    "timer_rego_query_parse_ns": 49788,
    "timer_server_handler_ns": 625141
  },
  "msg": "Decision Log",
  "path": "example/result",
  "req_id": 1,
  "requested_by": "172.19.0.1:33880",
  "result": "Hello world!",
  "time": "2024-07-31T08:15:18Z",
  "timestamp": "2024-07-31T08:15:18.532524912Z",
  "type": "openpolicyagent.org/decision_logs"
}

Expected behavior

We expect that (at least) the decision log in console contains the trace_id and span_id as mentioned in the documentation.

It would also be good that every log statement performed in this current span contain the trace_id and span_id.

We do not have a central OpenTelemetry collector that is able to receive requests made using the Decision Log Service API. We do have log collectors running on every pod. So outputting the log in the console is a good option for us.

ashutosh-narkar commented 1 month ago

I think you'll have to set distributed_tracing.type=grpc in the OPA config to enable this.

Erates commented 1 month ago

I think you'll have to set distributed_tracing.type=grpc in the OPA config to enable this.

Hey @ashutosh-narkar, this works, but that also means that it tries to send the traces to an OpenTelemetry Tracing endpoint. We do not have that. So we can not fill in the distributed_tracing.address or service_name, causing frequent warn log entries Distributed tracing: traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused\".

I've also tried putting the sample_percentage to 0, but then still it tries to send the spans (or at least, create the connection).

Maybe an option distributed_tracing.type=log so that the trace_id and span_id are logged, but not sent to an external service? Also, since the Decision Log Service API documentation is the only place referring to the trace_id, maybe add a link to the distributed_tracing documentation when the trace_id is required.

ashutosh-narkar commented 1 month ago

Also, since the Decision Log Service API documentation is the only place referring to the trace_id, maybe add a link to the distributed_tracing documentation when the trace_id is required.

Sure, feel free to submit a PR with the doc updates.

Maybe an option distributed_tracing.type=log so that the trace_id and span_id are logged, but not sent to an external service?

If you turn on console decision logging they get logged. Couldn't you just use that?

stale[bot] commented 2 weeks ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.