Open jessberg opened 2 years ago
A 10,000 span queue will provide 20 seconds of runway at 500 spans per second. The default retry settings used by the queued retry helper have a 5s initial interval and 30s max interval with a 5 minute maximum elapsed time. Your queue should be at least 150,000 to accommodate 500 inbound spans per second over a five minute period. Though, realistically, that isn't enough at that rate and timeout duration since spans will not be cleared from the queue fast enough.
What do you mean by "seconds of runway"? Are you saying that the queue is keeping things around for 5 minutes? I thought when the backend was functional, the queue should remain drained at all times.
Also, if you are keeping them around for 5 minutes, why wouldn't 150,000 be enough? How do you know how fast the spans will be cleared from the queue?
I mean that if the arrival rate is 500/s and the exit rate is 0/s, the queue will be full in 20s. The queue will retain items until they are consumed. The consumer in this case may only consume items at a rate of 1 every 5min. That would be pathological behavior only expected to exhibit when the backend is unavailable or severely degraded.
It is, however, possible that the exit rate could be less than the arrival rate, at which point the queue will fill up:
However, because a store in reality generally has a limited amount of space, it can eventually become unstable. If the arrival rate is much greater than the exit rate, the store will eventually start to overflow, and thus any new arriving customers will simply be rejected (and forced to go somewhere else or try again later) until there is once again free space available in the store.
As for why 150,000 wouldn't be enough if the exit rate were 1 per 5 minutes, Little's Law tells us the answer. The queue would be full by the time the first item was processed if the arrival rate were 500/second. By the time the second item were processed it would still be full and 150,000 items would have been rejected. Only when the exit rate is greater than, or equal to, the arrival rate, will a queue not eventually fill up and reject new arrivals.
Spans will be cleared from the queue as fast as they can be sent by the exporter. In this case that would likely be governed by how fast your Jaeger service instance can accept new spans.
I did just try it with 150,000 queue size, and it worked for 8 minutes, then returned the old error. From this I believe that has to be the case; the exit rate is probably less than the arrival rate. Is there some way to monitor the exit rate to see what the limits are? Are there any guidelines for expected throughput? I'd ideally like to be able to process thousands of traces per second; is that even possible?
The collector emits some metrics on port :8080
by default. You can look at those, including otelcol_receiver_accepted_spans
and otecol_exporter_sent_spans
to gauge the arrival and exit rates, respectively.
It should be possible to process thousands of spans per second, but probably not with 200 millicores of CPU allocated to your Jaeger service instance. Ensure that your systems are appropriately sized and consider horizintally scaling your collector instances.
Thank you, that's very helpful! I'll try to figure out what the rates are there then. Is there anything I have to do to enable that? I added 8080 to the list of ports in my kubernetes service but I'm getting connection refused when I try to access it.
I believe part of the issue here might be lack of batching after groupbytrace
processor is being used, which in this case produces an export request for each trace. It is highly recommended to have batch processor set in the pipeline. With Jaeger exporter, further reduction of requests might be achieved by using groupbyattrs processor in compaction mode. So essentially adding something akin to following:
processors:
batch:
send_batch_size: 512
send_batch_max_size: 1024
groupbyattrs:
...
service:
extensions: [zpages]
pipelines:
traces:
receivers: [otlp] # Receive otlp-formatted data from other collector instances
processors: [memory_limiter, groupbytrace/custom, batch, groupbyattrs]
exporters: [logging, jaeger]
I added 8080 to the list of ports in my kubernetes service but I'm getting connection refused when I try to access it.
Actually, the port number is :8888 :)
Actually, the port number is :8888 :)
This is why I should never trust my memory!
I'm assigning this to myself, but @jessberg, please do try @pmm-sumo's suggestion and let us know how it went.
Sorry for the late reply! I am still having a bit of trouble accessing the statistics. When I go to :8888, I get 404 Page Not Found. This is the same as I got with zpages, but I fixed that by going to specific endpoints. Is there analogous documentation on how to find the specific metrics?
Try the /metrics endpoint on that port: localhost:8888/metrics (given that the service is exposed to your localhost, perhaps using kubectl port-forward otelcollector 8888
or something similar). If it doesn't work, check the logs for the collector: the port should match what's shown there. If you still can't get it working, I recommend trying to run the collector binary locally, ruling out any possible Kubernetes networking issue.
I got it! I believe from this there are many many spans gotten by the receiver:
otelcol_receiver_accepted_spans{receiver="otlp",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest",transport="grpc"} 94975
And much less sent by the exporters:
otelcol_exporter_sent_spans{exporter="jaeger",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest"} 30008
otelcol_exporter_sent_spans{exporter="logging",service_instance_id="eaa1bea0-d948-4775-9117-16e8a2ddbf0c",service_version="latest"} 89090
This seems pretty bad; I seem to be overloading it by a factor of 3. Can you explain what exactly is happening with the sending_queue here? I understand I am overloading it, but would increasing the number of consumers help? Are the requests to Jaeger pipelined in any way?
This are relatively low-endish numbers (assuming the collector was operating for more than a couple of seconds). It's surprising to see issues with such volume of data. Will try to reproduce this locally. What are the parameters that you supply to your version of tracegen
?
Also, noticed that your collector version is rather old (v0.37.1 was released approx 6 months ago). I am not sure if there were any bugfixes since then but I would consider updating it. Then you will be able to use groupbyattrs processor which helps with compacting Jaeger data and might reduce the number of export requests
I have the following for my deployment file for tracegen; I made the image public so this should work out of the box
apiVersion: apps/v1
kind: Deployment
metadata:
name: tracegen
spec:
selector:
matchLabels:
app: tracegen
template:
metadata:
labels:
app: tracegen
spec:
containers:
- name: tracegen
image: docker.io/jessberg/tracegenerator
args: ["-otlp-insecure", "-rate", "500", "-otlp-endpoint", "otelcollector:4317", "-duration", "2000s"]
---
apiVersion: v1
kind: Service
metadata:
name: tracegen
spec:
type: ClusterIP
selector:
app: tracegen
ports:
- name: "4317"
port: 4317
targetPort: 4317
It was only working for a few seconds, but I noticed that the logging exporter was able to keep up with the load much more than Jaeger was. Also I might be using the num_consumers incorrectly; I put 100 in for Jaeger, but I'm not sure what exactly that does. If I increase it too much, I get an out of memory error, even with the memory limiter processor enabled.
As far as the groupbyattrs processor goes, won't that get rid of the fact that they are grouped by trace? I am using the fact they are grouped by trace for another exporter I want to add later. Right now I am planning on making one request per trace for that exporter; is this an infeasible design?
@jessberg I did a test (though locally, so local network stack connection). I observed a couple of things
First of all, you might want to check this metric:
otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="28dc61a6-bb74-48ef-a77d-92bd677d6e42",service_version="latest"} 0
It describes the current queue size. TLDR, if it grows then the backend is not able to keep up with the requests coming from the collector. This might be due to a number of reasons though
Secondly, I suspect the problems you're observing are due to request fragmentation. This is something which can be amplified by Jaeger data model. The solution I mentioned a couple days ago still holds (I slightly updated it) :) I did a test and while I haven't observed queue issues with your setup, adding the batch processor and groupbyattrs resulted in 3x reduction of OpenTelemetry Collector CPU usage.
You need slightly more recent OpenTelemetry Collector version for that (v0.46.0 or newer)
After a few seconds, I get
otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="1e08b758-ec74-4e26-a19f-2f9e652d2a25",service_version="latest"} 8397
That seems to confirm it is the queue that's the problem! What is request fragmentation? I know that batching and groupbyattrs might help, but with groupbyattrs, I am still concerned about negating the effect of groupbytrace for my other exporter. I did try batching with the config you put above, along with extra millicores for Jaeger as Aneurysm9 recommended; these are the results
otelcol_exporter_sent_spans{exporter="jaeger",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 97916
otelcol_exporter_sent_spans{exporter="logging",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 910775
So it looks like that is enough to solve this problem, however I'm still getting this error message in the logs
2022-04-08T19:16:56.655Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "jaeger", "error": "failed to push trace data via Jaeger exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "32.918303921s"}
This happens roughly every 15 lines of the default logging logs.
I also have
otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="4f4c1705-2676-4ada-b46a-d18c886c2acb",service_version="latest"} 1566
from the run with batching.
So I believe from this that perhaps it's not enough to just batch, and maybe you need the groupbyattr as well. However, I still don't really want to get rid of groupbytrace since I need it for my other exporter; does groupbyattr negate the effects of groupbytrace?
I guess more broadly I am still a bit confused on the model for the exporters. There are consumers that try to export to the backends, but do these consumers block on the backend to respond? Are they pipelined? What is consuming from the sending queue? I guess I'm trying to figure out that if the backend is unable to keep up with load, is that because the exporter is too slow to export, or is it because the backend won't take that many requests.
Also, why is it that the groupbyattr processor helps? Sorry for the deluge of questions, just want to make sure I understand the system because I'll have to run it at other loads as well :)
What is request fragmentation?
TLDR, Jaeger exporter is creating a separate batch/request for each ResourceSpans. So when there are multiple ResourceSpans objects in the batch (even if effectively pointing to the same Resource Attributes values), it will create multiple requests.
By using groupbyattrs
processor, multiple ResourceSpans with matching Attributes are joined together. This might significantly reduce the number of outgoing Jaeger requests.
I am still concerned about negating the effect of groupbytrace for my other exporter
If the spans are pointing to the same resource then it's not going affect them. Question though - why do you want to have the traces grouped together? This is intended for processing them in the OpenTelemetry Collector pipeline - e.g. via tailsamplingprocessor
I did try batching with the config you put above
It's important to note that while batching will help a lot, it's not enough due to the reasons I mentioned above. You should use batching + groupbyattrs processor (in that sequence).
Batching will allow to combine a set of spans together and groupbyattrs will ensure the model is optimized
That makes sense for the general case; however, the traces in my trace generator have one span per resource. I'm not sure it would result in any savings there. I've tried anyway with the latest build; the config above gave the error
collector server run finished with error: cannot build pipelines: error creating processor "groupbyattrs" in pipeline "traces": option 'groupByKeys' must include at least one non-empty attribute name
I added host.name
as a key in the config to deal with that and ran it with just groupbyattr, batch, and memory limiter, and it worked :). However when I reintroduced groupbytrace, it started lagging behind again, and the stats were about 40210 spans processed by Jaeger and 731039 by logging. I've attached the new config below.
The exporter I'm creating goes to Google Cloud Storage, and I am hashing the structure of the trace as a part of the metadata of the trace. So when I store a trace, I need to have it all together in the collector to create the hash, and then I can store the spans individually. Right now, I have it creating a separate request per span, just like Jaeger does, so I thought they'd scale similarly. I got the same error on both exporters (Jaeger and my custom exporter), but I reproduced it with just Jaeger here, because I figured I should only ask about the code here on this repo.
The config:
apiVersion: apps/v1
kind: Deployment
metadata:
name: otelcollector
spec:
selector:
matchLabels:
app: otelcollector
template:
metadata:
labels:
app: otelcollector
spec:
# Init container retrieves the current cloud project id from the metadata server
# and inserts it into the collector config template
# https://cloud.google.com/compute/docs/storing-retrieving-metadata
initContainers:
- name: otel-gateway-init
image: gcr.io/google.com/cloudsdktool/cloud-sdk:326.0.0-alpine
command:
- '/bin/bash'
- '-c'
- |
sed "s/{{PROJECT_ID}}/$(curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/project/project-id)/" /template/collector-gateway-config-template.yaml >> /conf/collector-gateway-config.yaml
volumeMounts:
- name: collector-gateway-config-template
mountPath: /template
- name: collector-gateway-config
mountPath: /conf
containers:
- name: otelcollector
args:
- --config=/conf/collector-gateway-config.yaml
image: otel/opentelemetry-collector-contrib:0.37.1
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
volumeMounts:
- name: collector-gateway-config
mountPath: /conf
- name: google-cloud-key
mountPath: /var/secrets/google
volumes:
# Simple ConfigMap volume with template file
- name: collector-gateway-config-template
configMap:
items:
- key: collector-gateway-config-template.yaml
path: collector-gateway-config-template.yaml
name: collector-gateway-config-template
# Create a volume to store the expanded template (with correct cloud project ID)
- name: collector-gateway-config
emptyDir: {}
- name: google-cloud-key
secret:
secretName: pubsub-key
---
apiVersion: v1
kind: Service
metadata:
name: otelcollector
spec:
type: ClusterIP
selector:
app: otelcollector
ports:
- name: "1888"
port: 1888
targetPort: 1888
- name: "8080"
port: 8080
targetPort: 8080
- name: "8888"
port: 8888
targetPort: 8888
- name: "8889"
port: 8889
targetPort: 8889
- name: "13133"
port: 13133
targetPort: 13133
- name: "4317"
port: 4317
targetPort: 4317
- name: "55670"
port: 55670
targetPort: 55670
---
apiVersion: v1
kind: ConfigMap
metadata:
name: collector-gateway-config-template
# Open Telemetry Collector config
# https://opentelemetry.io/docs/collector/configuration/
data:
collector-gateway-config-template.yaml: |
extensions:
zpages:
receivers:
otlp:
protocols:
grpc:
processors:
groupbytrace/custom:
wait_duration: 1s
num_traces: 5000
num_workers: 4
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 30
batch:
send_batch_size: 512
send_batch_max_size: 1024
groupbyattrs:
keys:
- host.name
exporters:
logging:
jaeger:
endpoint: "jaeger:14250"
tls:
insecure: true
sending_queue:
enabled: true
num_consumers: 100
queue_size: 150000
service:
extensions: [zpages]
pipelines:
traces:
receivers: [otlp] # Receive otlp-formatted data from other collector instances
processors: [groupbytrace/custom, groupbyattrs, memory_limiter, batch]
exporters: [logging, jaeger]
However when I reintroduced groupbytrace, it started lagging behind again
@pmm-sumo, do you also see this? I'm focusing on flaky tests this week, but if you are also seeing issues with the groupbytrace, I'll add this to my queue.
(...) the traces in my trace generator have one span per resource. I'm not sure it would result in any savings there
Yeap, still relevant, even more so when used in conjuction with groupbytrace processor, which will break down the batches into smaller chunks
I've tried anyway with the latest build; the config above gave the error (...)
This must have been some older build. The change is there since v0.46.0. Could you double check? I was using it for some time and did not see this error
(config example)
processors: [groupbytrace/custom, groupbyattrs, memory_limiter, batch]
It's tricky but the above is in wrong order. memory_limiter
should be always first. groupbyattrs
should be after batch
(that way, it will optimize the whole batch). So It should be:
processors: [memory_limiter, groupbytrace/custom, batch, groupbyattrs]
Can you try this config @jessberg?
However when I reintroduced groupbytrace, it started lagging behind again
@pmm-sumo, do you also see this? I'm focusing on flaky tests this week, but if you are also seeing issues with the groupbytrace, I'll add this to my queue
I think we have one iteration of configuration to update here. I believe this problem is not groupbytrace-related. I will try to do some more tests later this week if the config change will not help.
I just tried that config; it worked!! Just so I understand what's going on here, was the issue that having a separate ResourceAttribute for every span took too much memory, and that clogged up the sending_queue? What exactly was the problem?
Great to hear that @jessberg! What I believe was happening is that there was a lot of small requests due to data fragmentation and Jaeger exporter specifics. Regardless of sendind_queue size, there was just too many small requests.
To give you some numbers, let's assume that you were generating 500 traces per second, each with 10 spans on average (so 5000 spans/s). Typically, with batch processor set to, say, 500 spans in batch and otlp exporter employed, one could anticipate 10 requests per second, each with 500 spans.
If there would be one ResourceSpans per such batch (which could be the case if there's only one underlying Resource generating the data), the number would be the same in case of Jaeger exporter. However, if the spans would be broken down into more ResourceSpans, Jaeger exporter would create more requests accordingly. In the extreme case, this might mean 5000 requests/s rather than 10 requests/s we've seen before. Making multiple small requests is much less efficient than having fewer large requests.
What happens here is I believe following:
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=10}
Span {trace_id=2, span_id=20}
Span {trace_id=3, span_id=30}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=11}
Span {trace_id=2, span_id=21}
Span {trace_id=3, span_id=31}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=10}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=11}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=2, span_id=20}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=2, span_id=21}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=3, span_id=30}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=3, span_id=31}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=10}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=11}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=2, span_id=20}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=2, span_id=21}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=3, span_id=30}
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=3, span_id=31}
Batch
ResourceSpans
Resource {host.name="foo"}
Span {trace_id=1, span_id=10}
Span {trace_id=1, span_id=11}
Span {trace_id=2, span_id=20}
Span {trace_id=2, span_id=21}
Span {trace_id=3, span_id=30}
Span {trace_id=3, span_id=31}
Having said that, the example discussed in this issue might be an extreme one (only one Resource present). I think in very large environments in might be tricky and the number of unique Resources per trace might be comparable to number of spans in it
This makes a lot of sense! Thank you for explaining it! I guess as a meta point, one thing I didn't really understand before is that it isn't super feasible to have a request per span. And then the reason this works so much better is because Jaeger does a request per ResourceSpan
, so it makes sense to batch them all together into one ResourceSpan
in order to cut down on the number of requests. Is that correct?
Yeah, I think that it's a fair way to look at this
Thanks so much for your help!! I will mark this as resolved.
@pmm-sumo Sorry, just one last question. Why is it so infeasible to have so many requests? Is it a limit on the amount of time we keep the requests hanging around? Surely if you can keep traces around for 10 seconds in the groupbytrace processor, you should be able to keep them around while you wait for a response from the storage backend.
@pmm-sumo, is there something we should be doing as part of the Jaeger exporter here natively? Or perhaps something we should recommend as part of its readme?
@pmm-sumo Sorry, just one last question. Why is it so infeasible to have so many requests? Is it a limit on the amount of time we keep the requests hanging around? Surely if you can keep traces around for 10 seconds in the groupbytrace processor, you should be able to keep them around while you wait for a response from the storage backend.
@jessberg the problem is with volume of new requests each second. In the extreme example above I put above we had 10 large requests vs 5000 very small requests per second.
IIRC in your exporter you had 100 consumers set. This means it has a budget of 100 reqs-s/s (within one second they can handle up to 100 requests, 1s each or 1000 requests, 100ms each)
Let's say that the overall (network + server) latency for handling a request is 100ms. Of course, for large request it will be larger, but even if it would be 10s, it's still sustainable. 100 consumers will be able to keep up with 10 reqs/s * 10s
For the smaller requests, this is a much bigger problem. 5000 reqs/s * 100ms is 5x above the consumers capability. Each second they will handle only 1000 of them. So they will not be able to keep up with handling all the export requests and eventually the queue will fill up
@pmm-sumo, is there something we should be doing as part of the Jaeger exporter here natively? Or perhaps something we should recommend as part of its readme?
Good call @jpkrohling, I was thinking about this and I think we could either add a recommendation for groupbyattrs
processor to Jaeger docs (and maybe to groupbytrace
, since fragmentation could be emphasized by using it), or we could think about including this functionality already in place. Maybe this code should land in e.g. exporterhelper or so. What do you think? Maybe we should bring it to the SIG?
Maybe this code should land in e.g. exporterhelper or so. What do you think?
I think it's a good idea.
Maybe we should bring it to the SIG?
I have a feeling that this is somewhat specific to Jaeger, but we can certainly start with Jaeger and share the findings with the remaining of the SIG.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I previously linked two issues here as potentially related, but they are not.
Do I read it right that this issue is actually now a documentation one? We should provide users with information on how to use the groupbyattrs processor to compact the number of requests made to the underlying exporter, right?
Yes, my issue was from the "sending_queue is full" message, it was very difficult to figure out that the groupbytrace processor was interacting badly with the Jaeger exporter and I needed groupbyattrs to fix it.
I think a note on the Jaeger exporter to always use it with groupbyattrs would fix it.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Describe the bug I am trying to run the collector with the group by trace processor, and Jaeger exporter. I have an application that is producing 500 traces per second, and I have set the sending_queue to have a size of 10,000. In my view, that should never overwhelm the queue because 500 traces per second, even with random bursts and noise, shouldn't overwhelm a queue of size 10,000. However, I repeatedly get the error
and
These errors remain when I have my application set to produce 500 traces per second, but go away when I lower the load to 300 traces per second. Because of that, and because I see in the logs:
I am fairly confident that Jaeger is configured correctly.
Steps to reproduce I have a version of the tracegen application that produces 12 span traces. It is available here. I am running that with Jaeger and the OpenTelemetry collector. My Jaeger and OTel files are below.
What did you expect to see? I expected the collector to work without throwing sending_queue full errors, because I have set the sending_queue size very high.
What did you see instead? The collector repeatedly throws out data.
What version did you use? Version: 0.37.1
What config did you use? The OpenTelemetry config file:
The Jaeger config file:
Environment I'm running these as separate containers on GCP.