[processors/batch] Size calculations inconsistent, causing unbounded batch size in bytes

timcosta commented 3 years ago

Describe the bug The batch processor is not limiting batch size by bytes, but rather by item count. I believe this is due to confusing nomenclature, let me try to explain...

In config.go we define SendBatchSize and SendBatchMaxSize, neither of which specify the unit being counted, i.e whether the size is counting bytes or items.

In batch_processor.go size() explicitly refers to bytes, and itemCount() is explicitly the number of items based on the documentation.

The problem begins in processItem() where the line for bp.batch.itemCount() >= bp.sendBatchSize begins to mix definitions of count and size and compares the number of items to the configured batch size.

When we later sendItems() we have two metrics statBatchSendSize and statBatchSendSizeBytes which are properly counted using itemCount() and size() respectively.

In the export() functions we double down on the mix up with if sendBatchMaxSize > 0 && bl.logCount > sendBatchMaxSize again mixing count and size terms when the config and function documentation shows that they are distinct values.

In split_logs.go the comparison conflates count and size once again.

The tests for this appear to be misleading, because they are explicitly counting bytes for the size, but then use a logsPerRequest variable when calculating the expected number of batches.

This causes logs/metrics/traces to be batched based on item count rather than byte size, which means we have no way to account for payload size limits when using the otlphttpexporter or any other exporter that can be payload size sensitive.

Steps to reproduce Any configuration will reproduce this issue.

What did you expect to see? I expected setting send_batch_size would limit the size of a batch in bytes, not by item count. I would even go as far to point out that the default of 8192 being a common memory power of 2 multiple (8 KiB, 8 MiB, 8 GiB depending on units) creates a super misleading experience.

What did you see instead? The batch processor batches by item count, which causes us to hit payload size limits in common reverse proxies or frameworks with no way to set a custom payload size limit as we have no way to restrict the batch size.

What version did you use? Version: main branch

What config did you use?

receivers:
  fluentforward:
    endpoint: 0.0.0.0:8006

processors:
  batch:

exporters:
  logging:
    loglevel: debug
  otlphttp:
    endpoint: https://redacted
    headers:
      authorization: Basic redacted

service:
  pipelines:
    logs:
      receivers: [fluentforward]
      processors: [batch]
      exporters: [otlphttp, logging]

Environment OS: otel/opentelemetry-collector-contrib:0.27.0 on AWS EKS Compiler(if manually compiled): N/A, Docker image

Additional context I was considering putting together a PR for this, but decided that given the pervasiveness of the problem (affects logs, traces, metrics, and processor telemetry) it would be better to open an issue first to discuss the desired state.

I consider this a pretty big issue, potentially even a blocker for 1.0, as it's a common configuration (payload size limits) that can cause undeliverable batches that are unrecoverable.

I would propose that we deprecate the configurations send_batch_size and send_batch_max_size and replace them with two new values, send_batch_min_size_bytes and send_batch_max_size_bytes, and temporarily map the old names to the new names internally with a warning log so we don't break anything. Then we will need to update all code to look at size() rather than itemCount() when determining whether or not to split the batch. This serves two purposes, it clarifies the difference between the two configuration variables, and it clarifies the units and purpose of the variables.

I can take a stab at this if the maintainers agree with my proposed approach, but am new to Go so it may take some time to do and given the implications it may be wiser to have someone more familiar take this on.

timcosta commented 3 years ago

There's also the added question of how batch should handle records where a single unsplittable record exceeds send_batch_max_size_bytes. This isnt an issue in the current item count based implementation, but may become an issue if/when the code is fixed to count bytes instead of items.

bogdandrutu commented 3 years ago

Counting the bytes is very exporter protocol specific unfortunately and there is no easy way for us to do in a way that will work for all protocols, also even for OTLP is pretty expensive to calculate size, so it will be good if this will be an additional component that splits by size instead of the core batch processor.

timcosta commented 3 years ago

Hey Bogdan, thanks for your response! If counting size isn't desired in this batch processor, I think the config variable names should still be updated to something like send_batch_min_item_count and send_batch_max_item_count to avoid confusion, as size with a default value of 8192 for one of the configs reads and looks like it's referring to bytes/memory rather than item count, and the tests also count bytes for size rather than items.

I totally understand that counting bytes is pretty exporter specific and can be expensive to do so adding it to this module may not be desired. I think there's a use case for a core processor that does batch by size though, as many of the core exporters (otlp, otlphttp, jaeger, etc) may send data to an endpoint behind a reverse proxy like nginx or aws api gateway that limits payload sizes. It wouldnt be perfect for sure, and if the limit on the gateway is 5MB then people may have to set their batch size limit to 4.5MB, but I think a central size-based batcher is still helpful/necessary given how many exporters are likely to have some sort of payload size limitation in place, and it would be easier to build a central batch size processor than add size/splitting logic to all of the relevant exporters.

maitdaddy1 commented 3 years ago

If I'm not mistaken the Payload size for API GW is 10MB and for Lambda its 6MB

jack-berg commented 2 years ago

Can a byte size limiting option be implemented by counting the bytes of the in memory representation of the records? Obviously being a generic component its not possible for the batch processor to know the final serialized size of an unknown downstream exporter, but each exporter likely has a semi predictable translation ratio between the size of the in memory representation and the serialized representation. E.g. maybe on average, the OTLP representation is 1.2x the size of the in memory representation.

This would give a better, although still imperfect, way of limiting the size of the exported payload.

danielgblanco commented 2 years ago

I think being able to batch by in-memory bytes, although it wouldn't be accurate without knowing the exporter representation, would give a more accurate way of defining batch limits than item count when the receiver expects payloads to be of a given size.

Systems that behave better with a more constant payload size would also benefit from this as well, regardless of payload limits, as currently the distribution of payload sizes can vary tremendously depending on the nature of the spans within them. Sending small (in byte size) batches can affect performance.

ying-jeanne commented 3 months ago

I'm considering 3 approaches for this task:

Implementing a function similar to (ms Metrics) DataPointCount() but to calculate the byte count. This method requires each metric type to incorporate a Size() function, which would determine the in-memory byte size of the metric. Think of it as using unsafe.Sizeof() for each field.
Adding first 5 datapoints (this is just a number, in reality let's say x datapoints) using the add() function, then using the sizer to determine the byte size per added datapoints. let's say we have a limit of 10000 bytes, first 5 points size is 200 bytes, then per datapoints is 40 bytes. Subsequently, we employ a binary check, with the next checkpoint triggered when the number of datapoints reaches 0.5 times the size limit divided by the bytes per datapoint. In my example, the next checkpoint would be at 10000*0.5/40 = 125 datapoints. We progressively update the checkpoint to 75%, 87.5, 94% ( here the bytes per datapoint is updated with more datapoints every time in order to get better precision), and so forth, until passing 90% or 95%, in the end, we consider it is good enough to batch and send.
Simplify option 2, Just do the estimation at 50% checkpoint above and define total datapoints allowed. take the min(send_batch_max_item_count, the total datapoints allowed we got)

The second idea comes from the assumption that the cost of calling bm.sizer.MetricsSize(req) to get request size is the biggest concern in this thread, and we would like to minimise this but during the same time still have a good enough solution.

Do we actually have a preference on the 2 approaches?

open-telemetry / opentelemetry-collector

[processors/batch] Size calculations inconsistent, causing unbounded batch size in bytes #3262