Multi-tenancy E2E test - Githubissues

carsonip commented 6 months ago

Is your feature request related to a problem? Please describe.

End-to-end test that verifies multi-tenancy support.

Describe the solution you'd like

I am not familiar with the topic, but it came up in the discussion with @AlexanderWert and @jpkrohling during KubeCon EU that it will be great to have E2E test around multi-tenancy support, since we are not sure what's missing in the collector now.

@jpkrohling you mentioned that you have some initial ideas about the requirements of the tests, and will be happy to share them. Please feel free to edit this issue, or open another issue to capture the requirements.

jpkrohling commented 5 months ago

I know that @dgoscn was working on some manual verifications to make sure that an end-to-end solution was possible, involving the header-setter extension and eventually getting to Loki exporter (or OTLP HTTP). The idea is that a tenant (X-Scope-OrgID) sent to the original receiver could be propagated down to backends via our exporters.

Let me know if there are updates, or if you need further clarification on what would be needed to perform this (manual) test.

dgoscn commented 5 months ago

Hello, @carsonip and @jpkrohling.

As Juraci mentioned above, I was working on some manual verifications some time ago with the header-setter extension. I will share some of my verifications below.

PS: I divided this answer into two blocks

Setup
Workflow to validate the steps

Setup

opentelemetry-collector-contrib - 0.90.1

config.yaml

receivers:
  otlp:
    protocols:
      grpc: #4317
        endpoint: localhost:4327
        include_metadata: true
      http: #4318
        endpoint: 0.0.0.0:4328 
        include_metadata: true
processors:

extensions:
  headers_setter:
    headers:
      - action: upsert
        key: X-Scope-OrgID
        from_context: x-scope-orgid

exporters:
  # where the data goes
  logging/info:
    verbosity: basic
  logging/debug:
    verbosity: detailed
  prometheusremotewrite:
    endpoint: "http://localhost:9009/api/v1/push"
    auth:
      authenticator: headers_setter
    tls:
      insecure: true
  otlp:
    endpoint: http://localhost:4317
    auth:
      authenticator: headers_setter
    tls:
      insecure: true 

connectors:
  spanmetrics:
    dimensions:
      - name: http.method
      - name: http.status_code
    metrics_flush_interval: 15s 

service:
  telemetry:
    metrics:
      level: "detailed"
    traces:
      propagators: "tracecontext"
  extensions: [headers_setter]
  pipelines:
    traces:
      receivers: [otlp]
    #   processors: []
      exporters: [spanmetrics, otlp]
    metrics:
      receivers: [spanmetrics]
      processors: []
      # exporters: [prometheusremotewrite, logging/debug, prometheus ]
      exporters: [prometheusremotewrite]

Running the collector

cd ./cmd/otelcontribcol && GO111MODULE=on go run --race . --config ../../local/config.yaml && --log-level=DEBUG

Grafana Mimir

I made use of Play with Grafana for testing

On the Mimir directory:

cd mimir && cd docs/sources/mimir/get-started/play-with-grafana-mimir

docker compose up

Grafana Mimir Dashboard - http://localhost:9000/ Mimir UI - http://localhost:9009/

Grafana Tempo

I just downloaded the grafana tempo tempo_2.2.3_linux_amd64.tar.gz from the GitHub repository

And then executed the following commands:

tar vxf tempo_2.2.3_linux_amd64.tar.gz

cd tempo

./tempo --config.file tempo.yaml --multitenancy.enabled=true

Grafana Dashboard

I had some problems testing some data sources with the Grafana Dashboard from Play with Mimir, so I decided to create another one just for the sake of happiness (sorry for the overhead).

https://grafana.com/grafana/download/9.4.3 Downloaded on this link

And running through the command:

cd grafana-9.4.3 && ./bin/grafana server

Grafana UI - http://localhost:3000/

Workflow to validate the steps

Collector:
extensions:
    headers_setter:
      headers:
        - action: upsert
          key: X-Scope-OrgID
          from_context: x-scope-orgid
  service:
    extensions:
      - headers_setter
    pipelines:
      metrics:
        receivers:
          - spanmetrics         # (3) connector
        processors:
        exporters:
          - prometheusremotewrite   # (4) mimir
      traces:
        receivers:
          - otlp            # (1) telemetrygen
        processors:
        exporters:
          - spanmetrics         # (2) connector
          - otlp            # (2) Tempo

The objective is for the HTTP request in step 4 to have the X-Scope-OrgID received from the client.

Create an instance of Mimir to receive data via Remote Write
Create an instance of Grafana to visualize the data
Create a Time instance for traces
Send data to Mimir with X-Scope-OrgID
Send data to Tempo with X-Scope-OrgID
Create a Mimir data source in Grafana using X-Scope-OrgID
Create a Time data source in Grafana using X-Scope-OrgID
View tenant metrics and traces

tracegen for generate the telemetry of traces

tracegen -otlp-endpoint localhost:4317 -otlp-insecure -service cajuina -otlp-header 'X-Scope-OrgID="demo"'

telemetrygen for generate the telemetry of metrics

telemetrygen metrics --otlp-header='X-Scope-OrgID="123"' --otlp-insecure

Checking the outputs

On the Grafana Explore, you can see the results of the tracegen

On the other hand, you can check the Header X-Scope-OrgID="demo" set on the Data Source for the Tempo

It is also set for the Grafana Mimir

However, due the time I don't remember very well what the steps to validate the Mimir. For instance, I had some trouble to test "again" the telemetrygen for metrics

i.e.

telemetrygen metrics --workers 1 --interval 1s --metrics 1 --rate 1 --otlp-endpoint localhost:4317 --otlp-insecure --otlp-header='X-Scope-OrgID="demo"'

2024-05-01T12:39:33.690-0300    INFO    grpc@v1.58.0/clientconn.go:1338 [core][Channel #1 SubChannel #2] Subchannel Connectivity change to READY        {"system": "grpc", "grpc_log": true}
2024-05-01T12:39:33.690-0300    INFO    grpc@v1.58.0/clientconn.go:592  [core][Channel #1] Channel Connectivity change to READY {"system": "grpc", "grpc_log": true}
2024-05-01T12:39:33.690-0300    INFO    metrics/metrics.go:94   generation of metrics is limited        {"per-second": 1}
2024-05-01T12:39:33.691-0300    FATAL   metrics/worker.go:55    exporter failed {"worker": 0, "error": "failed to upload metrics: rpc error: code = Unimplemented desc = unknown service opentelemetry.proto.collector.metrics.v1.MetricsService"}

Sorry for any misunderstanding, but I omitted some steps to avoid some wrong analysis. I hope that this can help and that we resolve the Mimir/Metrics step.

jpkrohling commented 4 months ago

@carsonip , the idea is basically to:

generate telemetry via telemetrygen and send a X-Scope-OrgID with the outgoing request
set include_metadata on the receiver side, so that the HTTP headers are placed in the context
export the received telemetry to the backend, like Tempo, and use the header-setter authenticator to propagate the X-Scope-OrgID header
configure Tempo's datasource in Grafana to show only data related to the tenant set in the step 1

Same steps with metrics, and same steps with the span metrics generator (connector). A further step would be to use the batch processor, grouping by tenant.

open-telemetry / opentelemetry-collector

Multi-tenancy E2E test #9883