open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

sending_queue is full despite large queue size #9009

Open jessberg opened 2 years ago

jessberg commented 2 years ago

Describe the bug I am trying to run the collector with the group by trace processor, and Jaeger exporter. I have an application that is producing 500 traces per second, and I have set the sending_queue to have a size of 10,000. In my view, that should never overwhelm the queue because 500 traces per second, even with random bursts and noise, shouldn't overwhelm a queue of size 10,000. However, I repeatedly get the error

2022-04-01T20:42:36.975Z    error   groupbytraceprocessor@v0.37.1/processor.go:225  consume failed  {"kind": "processor", "name": "groupbytrace/custom", "error": "sending_queue is full"}

and

github.com/open-telemetry/opentelemetry-collector-contrib/processor/groupbytraceprocessor.(*groupByTraceProcessor).onTraceReleased.func1
    github.com/open-telemetry/opentelemetry-collector-contrib/processor/groupbytraceprocessor@v0.37.1/processor.go:225
2022-04-01T20:42:36.975Z    error   exporterhelper/queued_retry.go:115  Dropping data because sending_queue is full. Try increasing queue_size. {"kind": "exporter", "name": "jaeger", "dropped_items": 12}
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
    go.opentelemetry.io/collector@v0.37.0/exporter/exporterhelper/queued_retry.go:115
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
    go.opentelemetry.io/collector@v0.37.0/exporter/exporterhelper/traces.go:115
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
    go.opentelemetry.io/collector@v0.37.0/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/service/internal/fanoutconsumer.(*tracesConsumer).ConsumeTraces
    go.opentelemetry.io/collector@v0.37.0/service/internal/fanoutconsumer/traces.go:75
go.opentelemetry.io/collector/processor/processorhelper.NewTracesProcessor.func1
    go.opentelemetry.io/collector@v0.37.0/processor/processorhelper/traces.go:71
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
    go.opentelemetry.io/collector@v0.37.0/consumer/consumerhelper/traces.go:29

These errors remain when I have my application set to produce 500 traces per second, but go away when I lower the load to 300 traces per second. Because of that, and because I see in the logs:

2022-04-01T20:42:06.784Z    info    jaegerexporter@v0.37.1/exporter.go:186  State of the connection with the Jaeger Collector backend   {"kind": "exporter", "name": "jaeger", "state": "READY"}

I am fairly confident that Jaeger is configured correctly.

Steps to reproduce I have a version of the tracegen application that produces 12 span traces. It is available here. I am running that with Jaeger and the OpenTelemetry collector. My Jaeger and OTel files are below.

What did you expect to see? I expected the collector to work without throwing sending_queue full errors, because I have set the sending_queue size very high.

What did you see instead? The collector repeatedly throws out data.

What version did you use? Version: 0.37.1

What config did you use? The OpenTelemetry config file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otelcollector
spec:
  selector:
    matchLabels:
      app: otelcollector
  template:
    metadata:
      labels:
        app: otelcollector
    spec:
      # Init container retrieves the current cloud project id from the metadata server
      # and inserts it into the collector config template
      # https://cloud.google.com/compute/docs/storing-retrieving-metadata
      initContainers:
      - name: otel-gateway-init
        image: gcr.io/google.com/cloudsdktool/cloud-sdk:326.0.0-alpine
        command:
        - '/bin/bash'
        - '-c'
        - |
           sed "s/{{PROJECT_ID}}/$(curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/project/project-id)/" /template/collector-gateway-config-template.yaml >> /conf/collector-gateway-config.yaml
        volumeMounts:
        - name: collector-gateway-config-template
          mountPath: /template
        - name: collector-gateway-config
          mountPath: /conf
      containers:
      - name: otelcollector
        args:
        - --config=/conf/collector-gateway-config.yaml
        image: otel/opentelemetry-collector-contrib:0.37.1
        env:
          - name: GOOGLE_APPLICATION_CREDENTIALS
            value: /var/secrets/google/key.json
        volumeMounts:
        - name: collector-gateway-config
          mountPath: /conf
        - name: google-cloud-key
          mountPath: /var/secrets/google
      volumes:
      # Simple ConfigMap volume with template file
      - name: collector-gateway-config-template
        configMap:
          items:
          - key: collector-gateway-config-template.yaml
            path: collector-gateway-config-template.yaml
          name: collector-gateway-config-template
      # Create a volume to store the expanded template (with correct cloud project ID)
      - name: collector-gateway-config
        emptyDir: {}
      - name: google-cloud-key
        secret:
          secretName: pubsub-key
---
apiVersion: v1
kind: Service
metadata:
  name: otelcollector
spec:
  type: ClusterIP
  selector:
    app: otelcollector
  ports:
  - name: "1888"
    port: 1888
    targetPort: 1888
  - name: "8888"
    port: 8888
    targetPort: 8888
  - name: "8889"
    port: 8889
    targetPort: 8889
  - name: "13133"
    port: 13133
    targetPort: 13133
  - name: "4317"
    port: 4317
    targetPort: 4317
  - name: "55670"
    port: 55670
    targetPort: 55670

---                                                                             
apiVersion: v1                                                                  
kind: ConfigMap                                                                 
metadata:                                                                       
  name: collector-gateway-config-template                                       
# Open Telemetry Collector config                                               
# https://opentelemetry.io/docs/collector/configuration/                        
data:                                                                           
  collector-gateway-config-template.yaml: |                                     
    extensions:
      zpages:
    receivers:                                                                  
      otlp:                                                                     
        protocols:                                                              
          grpc:                                                                 
    processors:
      groupbytrace/custom:
        wait_duration: 1s
        num_traces: 5000
        num_workers: 4
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 30

    exporters:                                                                  
      logging:
      jaeger:
        endpoint: "jaeger:14250"
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 100
          queue_size: 10000

    service:                                                                    
      extensions: [zpages]
      pipelines:                                                                
        traces:                                                                 
          receivers: [otlp] # Receive otlp-formatted data from other collector instances
          processors: [groupbytrace/custom, memory_limiter]  
          exporters: [logging, jaeger]

The Jaeger config file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
spec:
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      serviceAccountName: default
      terminationGracePeriodSeconds: 5
      containers:
      - name: jaeger
        image: jaegertracing/all-in-one:1.30
        env:
        - name: COLLECTOR_ZIPKIN_HOST_PORT
          value: ":9411"
        ports:
        - containerPort: 5775
        - containerPort: 6831
        - containerPort: 6832
        - containerPort: 5778
        - containerPort: 14250
        - containerPort: 14268
        - containerPort: 14269
        - containerPort: 16686
        - containerPort: 9411
        resources:
          requests:
            cpu: 200m
            memory: 180Mi
          #limits:
          #  cpu: 300m
          #  memory: 300Mi
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger
spec:
  type: ClusterIP
  selector:
    app: jaeger
  ports:
  - name: "5775"
    port: 5775
    targetPort: 5775
    protocol: UDP
  - name: "6831"
    port: 6831
    targetPort: 6831
    protocol: UDP
  - name: "6832"
    port: 6832
    targetPort: 6832
    protocol: UDP
  - name: "5778"
    port: 5778
    targetPort: 5778
  - name: "14250"
    port: 14250
    targetPort: 14250
  - name: "14268"
    port: 14268
    targetPort: 14268
  - name: "14269"
    port: 14269
    targetPort: 14269
  - name: "9411"
    port: 9411
    targetPort: 9411
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-frontend
spec:
  type: NodePort
  selector:
    app: jaeger
  ports:
  - name: "16686"
    port: 16686
    targetPort: 16686

Environment I'm running these as separate containers on GCP.

Aneurysm9 commented 2 years ago

A 10,000 span queue will provide 20 seconds of runway at 500 spans per second. The default retry settings used by the queued retry helper have a 5s initial interval and 30s max interval with a 5 minute maximum elapsed time. Your queue should be at least 150,000 to accommodate 500 inbound spans per second over a five minute period. Though, realistically, that isn't enough at that rate and timeout duration since spans will not be cleared from the queue fast enough.

jessberg commented 2 years ago

What do you mean by "seconds of runway"? Are you saying that the queue is keeping things around for 5 minutes? I thought when the backend was functional, the queue should remain drained at all times.

Also, if you are keeping them around for 5 minutes, why wouldn't 150,000 be enough? How do you know how fast the spans will be cleared from the queue?

Aneurysm9 commented 2 years ago

I mean that if the arrival rate is 500/s and the exit rate is 0/s, the queue will be full in 20s. The queue will retain items until they are consumed. The consumer in this case may only consume items at a rate of 1 every 5min. That would be pathological behavior only expected to exhibit when the backend is unavailable or severely degraded.

It is, however, possible that the exit rate could be less than the arrival rate, at which point the queue will fill up:

However, because a store in reality generally has a limited amount of space, it can eventually become unstable. If the arrival rate is much greater than the exit rate, the store will eventually start to overflow, and thus any new arriving customers will simply be rejected (and forced to go somewhere else or try again later) until there is once again free space available in the store.

As for why 150,000 wouldn't be enough if the exit rate were 1 per 5 minutes, Little's Law tells us the answer. The queue would be full by the time the first item was processed if the arrival rate were 500/second. By the time the second item were processed it would still be full and 150,000 items would have been rejected. Only when the exit rate is greater than, or equal to, the arrival rate, will a queue not eventually fill up and reject new arrivals.

Spans will be cleared from the queue as fast as they can be sent by the exporter. In this case that would likely be governed by how fast your Jaeger service instance can accept new spans.

jessberg commented 2 years ago

I did just try it with 150,000 queue size, and it worked for 8 minutes, then returned the old error. From this I believe that has to be the case; the exit rate is probably less than the arrival rate. Is there some way to monitor the exit rate to see what the limits are? Are there any guidelines for expected throughput? I'd ideally like to be able to process thousands of traces per second; is that even possible?

Aneurysm9 commented 2 years ago

The collector emits some metrics on port :8080 by default. You can look at those, including otelcol_receiver_accepted_spans and otecol_exporter_sent_spans to gauge the arrival and exit rates, respectively.

It should be possible to process thousands of spans per second, but probably not with 200 millicores of CPU allocated to your Jaeger service instance. Ensure that your systems are appropriately sized and consider horizintally scaling your collector instances.

jessberg commented 2 years ago

Thank you, that's very helpful! I'll try to figure out what the rates are there then. Is there anything I have to do to enable that? I added 8080 to the list of ports in my kubernetes service but I'm getting connection refused when I try to access it.

pmm-sumo commented 2 years ago

I believe part of the issue here might be lack of batching after groupbytrace processor is being used, which in this case produces an export request for each trace. It is highly recommended to have batch processor set in the pipeline. With Jaeger exporter, further reduction of requests might be achieved by using groupbyattrs processor in compaction mode. So essentially adding something akin to following:

    processors:
      batch:
        send_batch_size: 512
        send_batch_max_size: 1024
      groupbyattrs:
    ...
    service:                                                                    
      extensions: [zpages]
      pipelines:                                                                
        traces:                                                                 
          receivers: [otlp] # Receive otlp-formatted data from other collector instances
          processors: [memory_limiter, groupbytrace/custom, batch, groupbyattrs]  
          exporters: [logging, jaeger]

I added 8080 to the list of ports in my kubernetes service but I'm getting connection refused when I try to access it.

Actually, the port number is :8888 :)

Aneurysm9 commented 2 years ago

Actually, the port number is :8888 :)

This is why I should never trust my memory!

jpkrohling commented 2 years ago

I'm assigning this to myself, but @jessberg, please do try @pmm-sumo's suggestion and let us know how it went.

jessberg commented 2 years ago

Sorry for the late reply! I am still having a bit of trouble accessing the statistics. When I go to :8888, I get 404 Page Not Found. This is the same as I got with zpages, but I fixed that by going to specific endpoints. Is there analogous documentation on how to find the specific metrics?

jpkrohling commented 2 years ago

Try the /metrics endpoint on that port: localhost:8888/metrics (given that the service is exposed to your localhost, perhaps using kubectl port-forward otelcollector 8888 or something similar). If it doesn't work, check the logs for the collector: the port should match what's shown there. If you still can't get it working, I recommend trying to run the collector binary locally, ruling out any possible Kubernetes networking issue.

jessberg commented 2 years ago

I got it! I believe from this there are many many spans gotten by the receiver:

otelcol_receiver_accepted_spans{receiver="otlp",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest",transport="grpc"} 94975

And much less sent by the exporters:

otelcol_exporter_sent_spans{exporter="jaeger",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest"} 30008
otelcol_exporter_sent_spans{exporter="logging",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest"} 89090

This seems pretty bad; I seem to be overloading it by a factor of 3. Can you explain what exactly is happening with the sending_queue here? I understand I am overloading it, but would increasing the number of consumers help? Are the requests to Jaeger pipelined in any way?

pmm-sumo commented 2 years ago

This are relatively low-endish numbers (assuming the collector was operating for more than a couple of seconds). It's surprising to see issues with such volume of data. Will try to reproduce this locally. What are the parameters that you supply to your version of tracegen?

Also, noticed that your collector version is rather old (v0.37.1 was released approx 6 months ago). I am not sure if there were any bugfixes since then but I would consider updating it. Then you will be able to use groupbyattrs processor which helps with compacting Jaeger data and might reduce the number of export requests

jessberg commented 2 years ago

I have the following for my deployment file for tracegen; I made the image public so this should work out of the box

apiVersion: apps/v1                                                             
kind: Deployment                                                                
metadata:                                                                       
  name: tracegen                                                                
spec:                                                                           
  selector:                                                                     
    matchLabels:                                                                
      app: tracegen                                                             
  template:                                                                     
    metadata:                                                                   
      labels:                                                                   
        app: tracegen                                                           
    spec:                                                                       
      containers:                                                               
      - name: tracegen                                                          
        image: docker.io/jessberg/tracegenerator                                
        args: ["-otlp-insecure", "-rate", "500", "-otlp-endpoint", "otelcollector:4317", "-duration", "2000s"]

---                                                                             
apiVersion: v1                                                                  
kind: Service                                                                   
metadata:                                                                       
  name: tracegen                                                                
spec:                                                                           
  type: ClusterIP                                                               
  selector:                                                                     
    app: tracegen                                                               
  ports:                                                                        
  - name: "4317"                                                                
    port: 4317                                                                  
    targetPort: 4317                                                                               

It was only working for a few seconds, but I noticed that the logging exporter was able to keep up with the load much more than Jaeger was. Also I might be using the num_consumers incorrectly; I put 100 in for Jaeger, but I'm not sure what exactly that does. If I increase it too much, I get an out of memory error, even with the memory limiter processor enabled.

As far as the groupbyattrs processor goes, won't that get rid of the fact that they are grouped by trace? I am using the fact they are grouped by trace for another exporter I want to add later. Right now I am planning on making one request per trace for that exporter; is this an infeasible design?

pmm-sumo commented 2 years ago

@jessberg I did a test (though locally, so local network stack connection). I observed a couple of things

First of all, you might want to check this metric:

otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="28dc61a6-bb74-48ef-a77d-92bd677d6e42",service_version="latest"} 0

It describes the current queue size. TLDR, if it grows then the backend is not able to keep up with the requests coming from the collector. This might be due to a number of reasons though

Secondly, I suspect the problems you're observing are due to request fragmentation. This is something which can be amplified by Jaeger data model. The solution I mentioned a couple days ago still holds (I slightly updated it) :) I did a test and while I haven't observed queue issues with your setup, adding the batch processor and groupbyattrs resulted in 3x reduction of OpenTelemetry Collector CPU usage.

You need slightly more recent OpenTelemetry Collector version for that (v0.46.0 or newer)

jessberg commented 2 years ago

After a few seconds, I get

otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="1e08b758-ec74-4e26-a19f-2f9e652d2a25",service_version="latest"} 8397

That seems to confirm it is the queue that's the problem! What is request fragmentation? I know that batching and groupbyattrs might help, but with groupbyattrs, I am still concerned about negating the effect of groupbytrace for my other exporter. I did try batching with the config you put above, along with extra millicores for Jaeger as Aneurysm9 recommended; these are the results

otelcol_exporter_sent_spans{exporter="jaeger",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 97916
otelcol_exporter_sent_spans{exporter="logging",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 910775

So it looks like that is enough to solve this problem, however I'm still getting this error message in the logs

2022-04-08T19:16:56.655Z    info    exporterhelper/queued_retry.go:215  Exporting failed. Will retry the request after interval.    {"kind": "exporter", "name": "jaeger", "error": "failed to push trace data via Jaeger exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "32.918303921s"}

This happens roughly every 15 lines of the default logging logs.

I also have

otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 1566

from the run with batching.

So I believe from this that perhaps it's not enough to just batch, and maybe you need the groupbyattr as well. However, I still don't really want to get rid of groupbytrace since I need it for my other exporter; does groupbyattr negate the effects of groupbytrace?

I guess more broadly I am still a bit confused on the model for the exporters. There are consumers that try to export to the backends, but do these consumers block on the backend to respond? Are they pipelined? What is consuming from the sending queue? I guess I'm trying to figure out that if the backend is unable to keep up with load, is that because the exporter is too slow to export, or is it because the backend won't take that many requests.

Also, why is it that the groupbyattr processor helps? Sorry for the deluge of questions, just want to make sure I understand the system because I'll have to run it at other loads as well :)

pmm-sumo commented 2 years ago

What is request fragmentation?

TLDR, Jaeger exporter is creating a separate batch/request for each ResourceSpans. So when there are multiple ResourceSpans objects in the batch (even if effectively pointing to the same Resource Attributes values), it will create multiple requests.

By using groupbyattrs processor, multiple ResourceSpans with matching Attributes are joined together. This might significantly reduce the number of outgoing Jaeger requests.

I am still concerned about negating the effect of groupbytrace for my other exporter

If the spans are pointing to the same resource then it's not going affect them. Question though - why do you want to have the traces grouped together? This is intended for processing them in the OpenTelemetry Collector pipeline - e.g. via tailsamplingprocessor

I did try batching with the config you put above

It's important to note that while batching will help a lot, it's not enough due to the reasons I mentioned above. You should use batching + groupbyattrs processor (in that sequence).

Batching will allow to combine a set of spans together and groupbyattrs will ensure the model is optimized

jessberg commented 2 years ago

That makes sense for the general case; however, the traces in my trace generator have one span per resource. I'm not sure it would result in any savings there. I've tried anyway with the latest build; the config above gave the error

collector server run finished with error: cannot build pipelines: error creating processor "groupbyattrs" in pipeline "traces": option 'groupByKeys' must include at least one non-empty attribute name

I added host.name as a key in the config to deal with that and ran it with just groupbyattr, batch, and memory limiter, and it worked :). However when I reintroduced groupbytrace, it started lagging behind again, and the stats were about 40210 spans processed by Jaeger and 731039 by logging. I've attached the new config below.

The exporter I'm creating goes to Google Cloud Storage, and I am hashing the structure of the trace as a part of the metadata of the trace. So when I store a trace, I need to have it all together in the collector to create the hash, and then I can store the spans individually. Right now, I have it creating a separate request per span, just like Jaeger does, so I thought they'd scale similarly. I got the same error on both exporters (Jaeger and my custom exporter), but I reproduced it with just Jaeger here, because I figured I should only ask about the code here on this repo.

The config:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otelcollector
spec:
  selector:
    matchLabels:
      app: otelcollector
  template:
    metadata:
      labels:
        app: otelcollector
    spec:
      # Init container retrieves the current cloud project id from the metadata server
      # and inserts it into the collector config template
      # https://cloud.google.com/compute/docs/storing-retrieving-metadata
      initContainers:
      - name: otel-gateway-init
        image: gcr.io/google.com/cloudsdktool/cloud-sdk:326.0.0-alpine
        command:
        - '/bin/bash'
        - '-c'
        - |
           sed "s/{{PROJECT_ID}}/$(curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/project/project-id)/" /template/collector-gateway-config-template.yaml >> /conf/collector-gateway-config.yaml
        volumeMounts:
        - name: collector-gateway-config-template
          mountPath: /template
        - name: collector-gateway-config
          mountPath: /conf
      containers:
      - name: otelcollector
        args:
        - --config=/conf/collector-gateway-config.yaml
        image: otel/opentelemetry-collector-contrib:0.37.1
        env:
          - name: GOOGLE_APPLICATION_CREDENTIALS
            value: /var/secrets/google/key.json
        volumeMounts:
        - name: collector-gateway-config
          mountPath: /conf
        - name: google-cloud-key
          mountPath: /var/secrets/google
      volumes:
      # Simple ConfigMap volume with template file
      - name: collector-gateway-config-template
        configMap:
          items:
          - key: collector-gateway-config-template.yaml
            path: collector-gateway-config-template.yaml
          name: collector-gateway-config-template
      # Create a volume to store the expanded template (with correct cloud project ID)
      - name: collector-gateway-config
        emptyDir: {}
      - name: google-cloud-key
        secret:
          secretName: pubsub-key
---
apiVersion: v1
kind: Service
metadata:
  name: otelcollector
spec:
  type: ClusterIP
  selector:
    app: otelcollector
  ports:
  - name: "1888"
    port: 1888
    targetPort: 1888
  - name: "8080"
    port: 8080
    targetPort: 8080
  - name: "8888"
    port: 8888
    targetPort: 8888
  - name: "8889"
    port: 8889
    targetPort: 8889
  - name: "13133"
    port: 13133
    targetPort: 13133
  - name: "4317"
    port: 4317
    targetPort: 4317
  - name: "55670"
    port: 55670
    targetPort: 55670

---                                                                             
apiVersion: v1                                                                  
kind: ConfigMap                                                                 
metadata:                                                                       
  name: collector-gateway-config-template                                       
# Open Telemetry Collector config                                               
# https://opentelemetry.io/docs/collector/configuration/                        
data:                                                                           
  collector-gateway-config-template.yaml: |                                     
    extensions:
      zpages:
    receivers:                                                                  
      otlp:                                                                     
        protocols:                                                              
          grpc:                                                                 
    processors:
      groupbytrace/custom:
        wait_duration: 1s
        num_traces: 5000
        num_workers: 4
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 30
      batch:
        send_batch_size: 512
        send_batch_max_size: 1024
      groupbyattrs:
        keys:
          - host.name

    exporters:                                                                  
      logging:
      jaeger:
        endpoint: "jaeger:14250"
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 100
          queue_size: 150000

    service:                                                                    
      extensions: [zpages]
      pipelines:                                                                
        traces:                                                                 
          receivers: [otlp] # Receive otlp-formatted data from other collector instances
          processors: [groupbytrace/custom, groupbyattrs, memory_limiter, batch]  
          exporters: [logging, jaeger]
jpkrohling commented 2 years ago

However when I reintroduced groupbytrace, it started lagging behind again

@pmm-sumo, do you also see this? I'm focusing on flaky tests this week, but if you are also seeing issues with the groupbytrace, I'll add this to my queue.

pmm-sumo commented 2 years ago

(...) the traces in my trace generator have one span per resource. I'm not sure it would result in any savings there

Yeap, still relevant, even more so when used in conjuction with groupbytrace processor, which will break down the batches into smaller chunks

I've tried anyway with the latest build; the config above gave the error (...)

This must have been some older build. The change is there since v0.46.0. Could you double check? I was using it for some time and did not see this error

(config example) processors: [groupbytrace/custom, groupbyattrs, memory_limiter, batch]

It's tricky but the above is in wrong order. memory_limiter should be always first. groupbyattrs should be after batch (that way, it will optimize the whole batch). So It should be:

processors: [memory_limiter, groupbytrace/custom, batch, groupbyattrs]  

Can you try this config @jessberg?

However when I reintroduced groupbytrace, it started lagging behind again

@pmm-sumo, do you also see this? I'm focusing on flaky tests this week, but if you are also seeing issues with the groupbytrace, I'll add this to my queue

I think we have one iteration of configuration to update here. I believe this problem is not groupbytrace-related. I will try to do some more tests later this week if the config change will not help.

jessberg commented 2 years ago

I just tried that config; it worked!! Just so I understand what's going on here, was the issue that having a separate ResourceAttribute for every span took too much memory, and that clogged up the sending_queue? What exactly was the problem?

pmm-sumo commented 2 years ago

Great to hear that @jessberg! What I believe was happening is that there was a lot of small requests due to data fragmentation and Jaeger exporter specifics. Regardless of sendind_queue size, there was just too many small requests.

To give you some numbers, let's assume that you were generating 500 traces per second, each with 10 spans on average (so 5000 spans/s). Typically, with batch processor set to, say, 500 spans in batch and otlp exporter employed, one could anticipate 10 requests per second, each with 500 spans.

If there would be one ResourceSpans per such batch (which could be the case if there's only one underlying Resource generating the data), the number would be the same in case of Jaeger exporter. However, if the spans would be broken down into more ResourceSpans, Jaeger exporter would create more requests accordingly. In the extreme case, this might mean 5000 requests/s rather than 10 requests/s we've seen before. Making multiple small requests is much less efficient than having fewer large requests.

What happens here is I believe following:

  1. Receiver gets the data from trace generator. It might possibly look like following (I'm skipping InstrumentationScope and few other things for brevity):
  Batch 
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=10}
      Span {trace_id=2, span_id=20}
      Span {trace_id=3, span_id=30}

 Batch
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=11}
      Span {trace_id=2, span_id=21}
      Span {trace_id=3, span_id=31}
  1. So far so good. Now let's go through groupbytrace. I think the outcome of that might be following:
  Batch 
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=10}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=11}

  Batch      
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=2, span_id=20}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=2, span_id=21}

 Batch
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=3, span_id=30}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=3, span_id=31}
  1. Now, let's do the batching. Will it help? Not very much for Jaeger exporter
  Batch 
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=10}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=11}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=2, span_id=20}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=2, span_id=21}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=3, span_id=30}
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=3, span_id=31}
  1. To avoid having 6 separate requests, we now do groupbyattrs and voila:
  Batch 
    ResourceSpans
      Resource {host.name="foo"}
      Span {trace_id=1, span_id=10}
      Span {trace_id=1, span_id=11}
      Span {trace_id=2, span_id=20}
      Span {trace_id=2, span_id=21}
      Span {trace_id=3, span_id=30}
      Span {trace_id=3, span_id=31}
pmm-sumo commented 2 years ago

Having said that, the example discussed in this issue might be an extreme one (only one Resource present). I think in very large environments in might be tricky and the number of unique Resources per trace might be comparable to number of spans in it

jessberg commented 2 years ago

This makes a lot of sense! Thank you for explaining it! I guess as a meta point, one thing I didn't really understand before is that it isn't super feasible to have a request per span. And then the reason this works so much better is because Jaeger does a request per ResourceSpan, so it makes sense to batch them all together into one ResourceSpan in order to cut down on the number of requests. Is that correct?

pmm-sumo commented 2 years ago

Yeah, I think that it's a fair way to look at this

jessberg commented 2 years ago

Thanks so much for your help!! I will mark this as resolved.

jessberg commented 2 years ago

@pmm-sumo Sorry, just one last question. Why is it so infeasible to have so many requests? Is it a limit on the amount of time we keep the requests hanging around? Surely if you can keep traces around for 10 seconds in the groupbytrace processor, you should be able to keep them around while you wait for a response from the storage backend.

jpkrohling commented 2 years ago

@pmm-sumo, is there something we should be doing as part of the Jaeger exporter here natively? Or perhaps something we should recommend as part of its readme?

pmm-sumo commented 2 years ago

@pmm-sumo Sorry, just one last question. Why is it so infeasible to have so many requests? Is it a limit on the amount of time we keep the requests hanging around? Surely if you can keep traces around for 10 seconds in the groupbytrace processor, you should be able to keep them around while you wait for a response from the storage backend.

@jessberg the problem is with volume of new requests each second. In the extreme example above I put above we had 10 large requests vs 5000 very small requests per second.

IIRC in your exporter you had 100 consumers set. This means it has a budget of 100 reqs-s/s (within one second they can handle up to 100 requests, 1s each or 1000 requests, 100ms each)

Let's say that the overall (network + server) latency for handling a request is 100ms. Of course, for large request it will be larger, but even if it would be 10s, it's still sustainable. 100 consumers will be able to keep up with 10 reqs/s * 10s

For the smaller requests, this is a much bigger problem. 5000 reqs/s * 100ms is 5x above the consumers capability. Each second they will handle only 1000 of them. So they will not be able to keep up with handling all the export requests and eventually the queue will fill up

@pmm-sumo, is there something we should be doing as part of the Jaeger exporter here natively? Or perhaps something we should recommend as part of its readme?

Good call @jpkrohling, I was thinking about this and I think we could either add a recommendation for groupbyattrs processor to Jaeger docs (and maybe to groupbytrace, since fragmentation could be emphasized by using it), or we could think about including this functionality already in place. Maybe this code should land in e.g. exporterhelper or so. What do you think? Maybe we should bring it to the SIG?

jpkrohling commented 2 years ago

Maybe this code should land in e.g. exporterhelper or so. What do you think?

I think it's a good idea.

Maybe we should bring it to the SIG?

I have a feeling that this is somewhat specific to Jaeger, but we can certainly start with Jaeger and share the findings with the remaining of the SIG.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 1 year ago

I previously linked two issues here as potentially related, but they are not.

jpkrohling commented 1 year ago

Do I read it right that this issue is actually now a documentation one? We should provide users with information on how to use the groupbyattrs processor to compact the number of requests made to the underlying exporter, right?

jessberg commented 1 year ago

Yes, my issue was from the "sending_queue is full" message, it was very difficult to figure out that the groupbytrace processor was interacting badly with the Jaeger exporter and I needed groupbyattrs to fix it.

I think a note on the Jaeger exporter to always use it with groupbyattrs would fix it.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.