open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.01k stars 2.32k forks source link

loadbalancing exporter is not sending span data to the back-end in a docker-compose setup #7910

Closed nasushkov closed 2 years ago

nasushkov commented 2 years ago

Describe the bug

I'm trying to make a proof of concept for a tail-based sampling solution, which is somewhat described here. I've built two distributions and placed each in a separate Docker image. One is built with the load-balancing exporter and the other with the tail-based sampling processor. I test this setup locally with a simple docker-compose file (described below).

I send spans to the load-balancing collector (UDP port 6832) and I see them exported as logging exporter shows. However, I don't see they are received by the tail-based sampling collector down the pipeline. What am I doing wrong here?

Steps to reproduce

What did you expect to see?

Spans are shown in the Jeager UI

What did you see instead?

I don't see any spans and they were not received by the tail-sampling collector.

What version did you use? Version: v0.42.0, v0.43.0 for the tailsamplingprocessor

What config did you use?

Load balancer configuration:

receivers:
  jaeger:
    protocols:
      grpc:
      thrift_binary:
      thrift_compact:
        # endpoint: localhost:6831
      thrift_http:

processors:

exporters:
  logging:
    loglevel: debug
  loadbalancing:
    protocol:
      otlp:
        tls:
          insecure: true

    # how to get the list of backends: DNS
    resolver:
      dns:
        hostname: sampling-collector # assumes 4317 as the default port for the resolved IP addresses

service:
  pipelines:
    traces:
      receivers: [ jaeger ]
      processors: []
      exporters: [ logging, loadbalancing ]

Tail-based sampling configuration:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    policies:
      [
        {
            name: composite-policy,
            type: composite,
            composite:
              {
                max_total_spans_per_second: 50000,
                policy_order: [error-catching-policy, latency-policy, test-composite-policy-3],
                composite_sub_policy:
                  [
                    {
                      name: error-catching-policy,
                      type: status_code,
                      status_code: {status_codes: [ERROR]}
                    },
                    {
                      name: latency-policy,
                      type: latency,
                      latency: {threshold_ms: 1000}
                    },
                    {
                      name: ab-test-policy,
                      type: string_attribute,
                      string_attribute: { key: ab-active, values: [true] }
                    },
                    {
                      name: default-policy,
                      type: always_sample
                    },
                  ],
                rate_allocation:
                  [
                    {
                      policy: error-catching-policy,
                      percent: 25
                    },
                    {
                      policy: latency-policy,
                      percent: 25
                    },
                    {
                      policy: ab-test-policy,
                      percent: 25
                    },
                    {
                      policy: default-policy,
                      percent: 25
                    }
                  ]
              }
          },
      ]

exporters:
  logging:
    loglevel: debug
  jaeger:
    endpoint: "jaeger:14250"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ tail_sampling ]
      exporters: [ logging, jaeger ]

Here is my docker-compose file:

version: "3.8"

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    restart: on-failure
    ports:
      # http://localhost:16686
      - 16686:16686

  sampling-collector:
    build: ./sampling-collector
    volumes:
      - ./sampling-collector/config:/etc/otel:ro
    expose:
      - "80"
      - "443"
      - "4317" 
    depends_on:
      - jaeger

  load-balancer-collector:
    build: ./load-balancer-collector/.
    volumes:
      - ./load-balancer-collector/config:/etc/otel:ro
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp" 
      - "5778:5778"
      - "14268:14268"
      - "9411:9411"
    depends_on: 
      - sampling-collector

Environment OS: macOS Big Sur (11.3.1)

Additional context

I use the following Dockerfile to build stuff:

ARG REGISTRY_URL=""
ARG GO_VERSION=1.17
ARG ALPINE_VERSION=3.14

FROM ${REGISTRY_URL}golang:${GO_VERSION}-alpine${ALPINE_VERSION} as build
RUN apk --no-cache add ca-certificates
RUN apk add make
RUN apk add build-base
WORKDIR /go/src/load-balancer-collector
COPY ./Makefile ./
COPY ./.otelcol-builder.yaml ./
RUN make build-prod

FROM scratch as bin
WORKDIR /load-balancer-collector
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /go/src/load-balancer-collector/dist/ ./

ENTRYPOINT [ "/load-balancer-collector/otelcol-lb"]
# Mount your config.yaml to /etc/otel/config.yaml
CMD ["--config", "/etc/otel/config.yaml"]

Makefile with a build-prod script:

.PHONY: init build build-prod clean

GOPATH := ${GOPATH}
GOPATH ?= ${HOME}/go 

init: 
    @GO111MODULE=on go install go.opentelemetry.io/collector/cmd/builder@latest 

build: init clean
    @$(GOPATH)/bin/builder --config=./.otelcol-builder.yaml --output-path=./dist --name=otelcol-lb

# We make sure that cgo is disabled, so the binary works with scratch image
build-prod: init clean
    @CGO_ENABLED=0 GOOS=linux $(GOPATH)/bin/builder --config=./.otelcol-builder.yaml --output-path=./dist --name=otelcol-lb

clean:
    @rm -rf ./dist
jpkrohling commented 2 years ago

Can you eliminate Docker from the equation? Most of the networking issues involving Docker are related to Docker itself, so, eliminating it would help ensure the problem is on the collector side.

You can use this as a reference: https://github.com/jpkrohling/opentelemetry-collector-deployment-patterns/tree/main/pattern-4-load-balancing

nasushkov commented 2 years ago

@jpkrohling thanks for getting back. I'll try to eliminate Docker from my setup, as you suggested above. Nevertheless, I already checked your reference and it seems to be a little outdated at first glance:

  1. When I run my distribution (v0.42.0) with this config it fails with an error:

Error: failed to get config: cannot unmarshal the configuration: error reading exporters configuration for "loadbalancing": 1 error(s) decoding:

  • 'protocol.otlp' has invalid keys: insecure 2022/02/16 11:46:12 collector server run finished with error: failed to get config: cannot unmarshal the configuration: error reading exporters configuration for "loadbalancing": 1 error(s) decoding:

It seems that insecure: true should be under tls

  1. Also, my distribution fails to accept --metrics-addr flag:

Error: unknown flag: --metrics-addr 2022/02/16 14:29:20 collector server run finished with error: unknown flag: --metrics-addr

So, any ideas how I can configure this port? Right now I can't start two collectors as they are conflicting on 8888

jpkrohling commented 2 years ago

Yes, you can use something like this in your configuration file, close to where the pipelines are defined:

service:
  telemetry:
    metrics:
      address: ":8988"
nasushkov commented 2 years ago

@jpkrohling awesome, now it works. I managed to test my setup without docker and the problem is gone. So you are right, there is an issue with that. What I have on my mind is that docker-compose does not really wait for a real readiness of the service in _dependson it just starts services in an order provided. Can it be an issue or maybe you have some ideas about what can go wrong?

Besides that, I have some questions regarding the architecture:

  1. Can I use multiple instances of LB to eliminate a single point of failure (i.e. LB)?
  2. What was the reason to use gRPC for communication between LB and downstream collectors? Can it be a performance issue (TCP as a transport layer)?
jpkrohling commented 2 years ago

Can I use multiple instances of LB to eliminate a single point of failure (i.e. LB)?

Yes. For highly elastic services, you might have consistency problems: one load balancer might have a different list of backends until it refreshes it again. The result is that trace IDs might end up on different backends for a short period of time. If this is critical to you, keep the TTL low. If that's still not acceptable, let me know.

What was the reason to use gRPC for communication between LB and downstream collectors? Can it be a performance issue (TCP as a transport layer)?

gRPC is the default transport for OTLP. I don't see it as being a source of performance issues. Do you have a specific problem in mind?

nasushkov commented 2 years ago

Yes. For highly elastic services, you might have consistency problems: one load balancer might have a different list of backends until it refreshes it again. The result is that trace IDs might end up on different backends for a short period of time. If this is critical to you, keep the TTL low. If that's still not acceptable, let me know.

So, it means that I can schedule 2 or more LB-s and it will work except that I can have some temporary problems when scaling up/down, right?

gRPC is the default transport for OTLP. I don't see it as being a source of performance issues. Do you have a specific problem in mind?

We prefer UDP over TCP as a transport protocol for telemetry (at least we use UDP on our agents). In particular, we had some hard times in the past with TCP due to its overhead and HOL blocking. However, as far as I understand otlp does not support UDP at the moment, so I just wonder can it be a bottleneck, especially in case of downstream collector failures. In that case we can loose some packages and HOL can happen.

jpkrohling commented 2 years ago

So, it means that I can schedule 2 or more LB-s and it will work except that I can have some temporary problems when scaling up/down, right?

Correct

We prefer UDP over TCP as a transport protocol for telemetry

That's not supported with OTLP. You can have your load balancers be configured to accept data via UDP with the Jaeger receiver, for instance, but the communication between the load balancers and the backing collectors is going to use either HTTP or gRPC, as only the OTLP exporter is supported.

nasushkov commented 2 years ago

Ok, I think the next step would be to deploy this setup in k8s and test it under the load. Also, would you mind if we make a pr in the future to support Jeager exporter in case we've got any issues with TCP?

Regarding the issue, I'll close it for now. Thanks for your support!

jpkrohling commented 2 years ago

Also, would you mind if we make a pr in the future to support Jeager exporter in case we've got any issues with TCP?

If you get into concrete problems, do open an issue and we can certainly discuss it!

gjshao44 commented 11 months ago

Yes. For highly elastic services, you might have consistency problems: one load balancer might have a different list of backends until it refreshes it again. The result is that trace IDs might end up on different backends for a short period of time. If this is critical to you, keep the TTL low. If that's still not acceptable, let me know. So, it means that I can schedule 2 or more LB-s and it will work except that I can have some temporary problems when scaling up/down, right?

Correct

This thread may be a bit dated, but I am interested in the rationale that 2 or more LB instances are able to maintain the integrity of traceid awareness, disregarding the short term issues. What is the mechanism to ensure each of the LB instances informed about each other's traceid routing? since the physical LB in front of collector LB instances may just send the spans to a different collector LB instances. See the flow I am demonstrating below:

Physical LB --> Collector LB instance 1 -> loadbalancing to multiple collector backends --> Collector LB instance 2 -> loadbalancing to multiple collector backends

Update: I listened to Juraci's talk again, and it seems that the LB collector instance calculating the hashing of traceid to determine which backend to send to, so I suppose multi LB instance would maintain the ingrity by using the same algorithm without the need to share their memory.