open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

Implement support for `max_event_buffer_size`: the maximum number of bytes from the final events payload sent #35299

Open kebroad opened 1 month ago

kebroad commented 1 month ago

Component(s)

exporter/splunkhec

Describe the issue you're reporting

Upon using this exporter we have noticed that when one of the max_content_length_ configurations is set (ie. max_content_length_logs), the size of the payload on the wire may not be consistent depending on the disable_compression setting.

When true, the write function will return an error if its over capacity, whereas the compression write function seems like it will write it if its compressed size is under the max_content_length.

So, if I understand it correctly, when max_content_length_logs is 1MB, the uncompressed writer will ensure its raw size is under 1MB, but the gzip writer will ensure its compressed size is under 1MB.

Is this intentional? We are exporting to an endpoint that has size limits on the uncompressed data but, while we want to compress it, with variable compression rates its hard to determine exactly what content length limit we would want to configure.

github-actions[bot] commented 1 month ago

Pinging code owners:

bderrly commented 1 month ago

The part that was most surprising here was that the documentation led us to believe that the compression was going to be just before being sent over the wire. Thus our expectation was that that data buffer would be capped at the max_content_length_logs, in our case 5 MB, and then compressed and sent over the wire. As mentioned above, our log destination (Logscale) is expecting a maximum uncompressed payload of 5 MB.

From README.md:

disable_compression (default: false): Whether to disable gzip compression over HTTP.

atoulme commented 1 month ago

The Content-Length header value is the size of the payload on the wire, not the uncompressed content.

If you want to configure the size of the uncompressed payload, we will need more work to support that.

kebroad commented 1 month ago

If you want to configure the size of the uncompressed payload, we will need more work to support that.

Yes, this is essentially what we want. I wouldn't mind working on a PR for this. Would you want to add another configuration field for this use case or change the existing behavior of the max_content_length_ fields in the buffer writers, or something else?

atoulme commented 1 month ago

I think that'd be something such as "payload_size". Note we also have a "max_event_size" configuration key. Those settings might play on each other. Do you want to have different sizes per signal, such as a key for log, metrics and traces? Or just one?

kebroad commented 1 month ago

We are currently only using this exporter for logs, so one setting for all three would be fine.

Im thinking we could perhaps have a field like max_content_length_type, which could be compressed or raw. compressed would be the equivalent of what we currently have, ie. if max_content_length_logs: 5000000, that would mean that the size of the compressed payload would have to be under 5MB. If it was raw at max_content_length_logs: 5000000, it would mean that the raw size of the payload must be under 5MB, which is then compressed to some lower size, then sent over the wire.

max_content_length_type would have no effect if disable_compression: true

crobert-1 commented 1 month ago

Removing needs triage based on code owner response. From what I understand, this is a valid enhancement request, but still may need some more discussion to iron out configuration and implementation details.

atoulme commented 1 month ago

No, please do not use content_length in your field. Content-Length is a HTTP header used to represent the size of the payload in bytes over the wire. This is important for middleware like Nginx. This request for enhancement is not tied to this HTTP header.

kebroad commented 1 month ago

Ah, I apologize, I did not connect the dots that content_length literally means the Content-Length header/content length on the wire.

you mentioned max_event_size which is the following:

max_event_size (default: 5242880): Maximum raw uncompressed individual event size in bytes. Maximum allowed value is 838860800 (~ 800 MB).

what do you think about a max_event_buffer_size, which would look something like this:

max_event_buffer_size: Maximum raw uncompressed event total buffer size in bytes. Default value is 2097152 bytes (2 MiB). Maximum allowed value is 838860800 (~ 800 MB)

This would be adopting the defaults/max values from the other content_length fields.

atoulme commented 1 day ago

We can use that setting to complete the approach ; initially, it should not have a default value so as to not introduce a breaking change.