Open dmitryax opened 8 months ago
I added emojis for voting but feel free to suggest other options as well
I think that having the queue size in MiB would be awesome and makes it easier to understand (especially for first time users). Also the operator could utilize it to automatically create Persistent Volumes based on these values.
No emoji option was given, I don't like either option:
👍 Yes, migrate the default queue measure to items 👎 Keep the requests as measure by default, even if it's not consistent with the batcher.
The third option not listed (must be :eyes:), is to wait for option 3 and migrate the default queue measure to size_limit_mib
. This behavior is the most consistent with the batch processor, since both items and requests can vary in size. For the queue to have a fixed-size memory footprint, which is my preference, we should migrate to a byte-count measure.
Thanks @jmacd, added
I think a unit smaller than MiB should be used for batch sizes; most common export APIs have limits near the 1 MB mark. For instance, CloudWatch's max batch size is 1 MiB. Google Cloud Logging's max batch size is 10 MiB. Promtail limits requests to 4 MiB. Splunk has a max batch size of 2 MiB. DataDog's limit is 3.2 MiB.
I would suggest using KiB or even just bytes.
This is also something that should probably be controllable by the individual exporter; I would expect the max batch size to be limited to an exporter-controlled max byte size, since there's no point in trying to send 20 MiB batches to CloudWatch.
Agree with @jmacd and @quentinmit, size_limit_mib
(or rather, size_limit_bytes
) would be preferable from a UX perspective. Users usually have no idea how big their batches or spans are, and are more comfortable making decisions when there's a clear correlation with cost: "I need X GiBs for this queue, which means I need a machine that costs me $y."
I think a unit smaller than MiB should be used for batch sizes;
Agree with this 100%. smaller batches generally scale better as well. Splunk cloud we try and encourage 256KB as a starting point, well below the 1MB soft limit we recommend.
Following the discussion https://github.com/open-telemetry/opentelemetry-collector/pull/8853#discussion_r1435671202
As part of https://github.com/open-telemetry/opentelemetry-collector/issues/8122, the batching capability is moved to the exporter side. The batch sizes are configured in terms of items by default. The current queue can be configured only in terms of requests. The different measures can be confusing to the end user. At least, we need to provide a way to configure the queue in terms of items that would be consistent with the batching.
This issue is intended to answer the following questions:
Currently, the
sending_queue
config group in the exporter configuration has the only field to control the queue capacity calledqueue_size
, which means the maximum number of requests the queue can hold. Thequeue
prefix can potentially be dropped as redundant during the migration to the new interface.The following measures can be supported by the new exporter queue additionally. The options 1 and 2 are already implemented, just not exposed to the user until this issue is resolved. Option 3 can be added in the future.
queue_size
field. We should keep it as an option.limit_mib
andspike_limit_mib
.We have the following options how the different queue measurements can be configured:
A. Provide different fields for each option and validate that only one is set. The following options can be added under
sending_queue
config group in the exporter configurationmax_size_items
,max_size_requests
andmax_size_mib
(vote with 😄 )size_limit_items
,size_limit_requests
andsize_limit_mib
(vote with 🎉 )items_limit
,requests_limit
andmib_limit
(vote with 😕 )B. Keep
queue_size
(or chose another name likesize
) and introduce another field for thequeue_measure
(ormeasure
) enum which can beitems
,requests
orMiB
. (vote with ❤️ )C. Keep
queue_size
(or chose another name likesize
) and support suffixes like10requests
,10items
,10[B|KB|MB..]
and keep the queue_size field. (vote with 🚀 )Keep in mind that the batching interface has to be aligned with what we come up with. https://github.com/open-telemetry/opentelemetry-collector/pull/8685 currently suggests the following options:
min_size_items
,max_size_items
. So it's aligned with option A at the moment. Potentially, we can provide different measures for batching as well, but initially, the batching will be items-based only.If we decide to migrate from requests to items by default in the
sending_queue
group, we need a migration plan. This cannot be done under the hood since it'll affect users. Option A requires deprecatingqueue_size
and moving to another config option. So we can align that change with this migration and mention the default measurement change in the field deprecation warning. However, options 2 and 3 don't require a config option change if we keep usingqueue_size. So it's better to change it to something else like
size` instead to provide the same migration experience.Emojis for voting: 👍 Yes, migrate the default queue measure to items 👎 Keep the requests as measure by default, even if it's not consistent with the batcher. 👀 Hold off on the migration until we have a byte-count measure, which eventually can be a new default
It's important to mention that any decision we make for this issue won't take place immediately. It'll be applied only after https://github.com/open-telemetry/opentelemetry-collector/issues/8122 is ready and stable by components moving to the new exporter helper.
@open-telemetry/collector-approvers please share your thoughts.