grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.99k stars 520 forks source link

Multitenancy does not work with non-GRPC ingestion #495

Closed mnadeem closed 2 years ago

mnadeem commented 3 years ago

Describe the bug failed to extract org id when auth_enabled: true

To Reproduce Steps to reproduce the behavior:

tempo-local.yaml

auth_enabled: true

server:
  http_listen_port: 3100

distributor:
  receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
    jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
      protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
        thrift_http:                   #
        grpc:                          # for a production deployment you should only enable the receivers you need!
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  trace_idle_period: 10s               # the length of time after a trace has not received spans to consider it complete and flush it
  #max_block_bytes: 1_000_000           # cut the head block when it hits this size or ...
  traces_per_block: 1_000_000
  max_block_duration: 5m               #   this much time passes

compactor:
  compaction:
    compaction_window: 1h              # blocks in this time window will be compacted together
    max_compaction_objects: 1000000    # maximum size of compacted blocks
    block_retention: 1h
    compacted_block_retention: 10m

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: E:\practices\docker\tempo\tempo\wal             # where to store the the wal locally
      bloom_filter_false_positive: .05 # bloom filter false positive rate.  lower values create larger filters but fewer false positives
      index_downsample: 10             # number of traces per index record
    local:
      path: E:\practices\docker\tempo\blocks
    pool:
      max_workers: 100                 # the worker pool mainly drives querying, but is also used for polling the blocklist
      queue_depth: 10000
docker run -d --rm -p 6831:6831/udp -p 9411:9411 -p 3100:3100  --name tempo -v E:\practices\docker\tempo\tempo-local.yaml:/etc/tempo-local.yaml --network docker-tempo  grafana/tempo:0.5.0 --config.file=/etc/tempo-local.yaml
docker run -d --rm -p 16686:16686 -v E:\practices\docker\tempo\tempo-query.yaml:/etc/tempo-query.yaml  --network docker-tempo  grafana/tempo-query:0.5.0  --grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml
curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -H 'X-Scope-OrgID: demo' -d '[{
 "id": "1234",
 "traceId": "0123456789abcdef",
 "timestamp": 1608239395286533,
 "duration": 100000,
 "name": "span from bash!",
 "tags": {
    "http.method": "GET",
    "http.path": "/api"
  },
  "localEndpoint": {
    "serviceName": "shell script"
  }
}]'
level=error ts=2021-01-31T07:16:33.6168738Z caller=log.go:27 msg="failed to extract org id" err="no org id"

image

Expected behavior

Environment:

Additional Context

mnadeem commented 3 years ago

Same issue with grafana/tempo:latest

joe-elliott commented 3 years ago

"auth_enabled" is a bit misleading, but it is consistent with Cortex and Loki. What this does is simply checks all incoming requests for a tenant id in the X-Scope-OrgID header. Unless you are doing a multitenant Tempo install just set auth_enabled to false.

mnadeem commented 3 years ago

I have enabled multitent in tempo.yaml and started tempo backend

auth_enabled: true

Curl and post man fails even though I am passing the X-Scope-OrgID header

curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -H 'X-Scope-OrgID: demo' 

And the message is

level=error ts=2021-01-31T07:16:33.6168738Z caller=log.go:27 msg="failed to extract org id" err="no org id"

Note: I do want to enable multi tenant

Same behavior in openshift

annanay25 commented 3 years ago

This smells like a bug on the Zipkin receiver, where we are not passing our authentication middleware to the zipkin server, but I will have to investigate further to confirm.

annanay25 commented 3 years ago

Also for our documentation, multitenancy_enabled is probably a better name for that config label, rather than auth_enabled.

annanay25 commented 3 years ago

Update after some preliminary digging -

joe-elliott commented 3 years ago

Although we can try, this unfortunately this might not be fixable the way we reuse the OTEL collector receivers. I've added an example in this PR that shows how you can use multitenancy in Tempo using the OTEL collector:

https://github.com/grafana/tempo/pull/497

Note the addition of an X-Scope-OrgID in the OTEL collector. This means that all traces passing through this collector will belong to the named tenant. You can then configure the collector to accept Zipkin.

    headers:
      x-scope-orgid: authed-tenant

The tempo datasource in Grafana also has headers added to indicate the tenant:

  jsonData:
    httpHeaderName1: 'Authorization'
  secureJsonData:
    httpHeaderValue1: 'Bearer authed-tenant'

and finally an extra parameter passed to tempo-query:

    command:
    - --query.bearer-token-propagation=true  # required to pass auth to the grpc plugin

Using the bearer token is unfortunate hack in series of unfortunate hacks. @annanay25 is currently working on dropping the Jaeger-query requirement which will at least cleanup this piece. @annanay25 please remember to clean up this example when you get rid of Jaeger query.

@mnadeem Thank you for reporting this issue. I am going to change the title to reflect that the issue is that multitenancy is not supported on non-GRPC ingestion. I consider everything in this post and the linked PR a workaround. So we will leave this issue open.

annanay25 commented 3 years ago

@annanay25 please remember to clean up this example when you get rid of Jaeger query.

👍 I will make sure. This work will start in coordination with the Grafana project team, probably in the next couple of weeks.

batwork7 commented 3 years ago

@annanay25 @mnadeem , I found this article to be helpful, but I am also running into the same issue even with auth.enabled set to false. In my case it is single-tenant. I do not want to enable multi tenancy.

Environment : Stand alone Tempo deployment with storage as local Multi Tenancy : No auth-enabled : false Intent : to send Jaeger traces to Tempo for the purposes of tracing from traces to Loki logs. Tempo Version : 0.7 the latest

batwork7 commented 3 years ago

Any help in resolving this issue would be grealy appreciated.

jjneely commented 2 years ago

I believe I'm experiencing this with Tempo 1.2.1 as well. Any updates here?

yvrhdn commented 2 years ago

No changes here, multitenancy is only supported when ingesting gRPC, so you should use either OTLP gRPC or Jaeger gRPC in a multitenant setup.

flokli commented 2 years ago

I'm using a proxy in front of tempo that doesn't support grpc. Is there any plans to keep the tenant ID when going over the tempo-oltp-http endpoint?

joe-elliott commented 2 years ago

We would need changes upstream to support multitenancy over this endpoint. GRPC takes all headers and automatically injects them into the context. HTTP does not.

If you have options w/r to your proxy I'd recommend envoy it is an excellent HTTP2/GRPC proxy.

danfromtitan commented 2 years ago

Any solution that entails setting the tenant header at the client end introduces a security flaw: all it takes is knowing the tenant header, then you can impersonate that tenant with reads and writes. The fact is clients are often independent services that send traces over external internet connections and there is a need to authenticate those clients at the gateway.

To solved this problem (in Cortex), we have the client doing mTLS auth with the ingress-nginx gateway, then nginx would set the tenant header based on the CN in the TLS certificate presented by the client. I was looking to do something similar in Tempo, this time just to find out multi-tenancy is not supported over HTTP.

GRPC over ingress-nginx is not as trivial as this demonstration makes it sond: the success of it depends on the instrumentation at the server's end. So far I've got traces making it to Tempo, but these occasional logs in Grafana Agent make me think I'm loosing some traces because connection racing conditions:

ts=2022-04-13T21:34:23Z level=error caller=exporterhelper/queued_retry_inmemory.go:93 msg="Exporting failed. No more retries left. Dropping data." component=traces traces_config=tempo kind=exporter name=otlp/0 error="max elapsed time expired rpc error: code = Unavailable desc = connection closed before server preface received" dropped_items=366

yvrhdn commented 2 years ago

@liguozhong suggested in https://github.com/grafana/tempo/issues/1381#issuecomment-1098877991 that we could use the Authenticator interface to extract this information out of the request coming from the OpenTelemetry receivers.

danfromtitan commented 2 years ago

Any solution that entails setting the tenant header at the client end introduces a security flaw: all it takes is knowing the tenant header, then you can impersonate that tenant with reads and writes. The fact is clients are often independent services that send traces over external internet connections and there is a need to authenticate those clients at the gateway.

To solved this problem (in Cortex), we have the client doing mTLS auth with the ingress-nginx gateway, then nginx would set the tenant header based on the CN in the TLS certificate presented by the client. I was looking to do something similar in Tempo, this time just to find out multi-tenancy is not supported over HTTP.

Managed to get GRPC working through ingress-nginx, the gist of it is to use _grpc_setheader in a configuration-snippet to insert the X-Scope-OrgID header. (Luckily the reading path does take the tenant name in an HTTP header, otherwise Grafana doesn't seem to have a decent GRPC data source that supports client TLS authentication.)

liguozhong commented 2 years ago

grpc_set_header

The grpc_set_header suggestion is great, I didn't think about it before, but I solved it temporarily by another way (fork otlp collector). https://github.com/open-telemetry/opentelemetry-collector/issues/5216 https://github.com/liguozhong/opentelemetry-collector/commit/3f6214ee68930a23a656df9ea739e6b6ef457992