Open jmacd opened 1 year ago
I'm not sure I fully understand the ask here. Enabling the queue introduces asynchronous behavior by design. Why do you need the queue if there is a requirement for the blocking/synchronous pipeline? What's the purpose of the queue in that case?
I am more-or-less requesting a feature that I would expect from the batchprocessor.
As we expect to move batchprocessor functionality into the exporterhelper, I expect queue will be a place where requests stand waiting to be batched and sent. When I disable the queue, there is no limit on the number of concurrent senders and no possibility of batching.
When I enable the queue, I expect to get batching. More importantly, I put a limit on the number of outgoing RPCs (i.e., NumConsumers
) which can be important in controlling load balance.
I will put this another way. I believe the batching facility should be independent of the facility for backpressure.
When the exporterhelper has the backpressure option I'm looking for and queueing enabled, the caller has to stick around until each data item is processed and return the error (as well as block on the caller's context timeout).
I've been working on the OTel Arrow project, which has gRPC-streaming exporter which does the same sort of thing. For each incoming request, it allocates a make(chan error, 1)
to receive the response, places that channel into the queue along with the data, and then falls into a select
awaiting either the response or the timeout.
I am happy with the latest developments in the batching support. We still require the queue sender to enable concurrency in the batch processor => so we have back-pressure but with unacceptable concurrency loss. Therefore, I will leave this open. @moh-osman3 is looking into a concurrency sender option that would let us replace the concurrentbatchprocessor
.
Is your feature request related to a problem? Please describe.
For an exporter that uses exporterhelper, I have the following requirements.
These two requirements need to be met whether or not there are retries enabled, and whether or not there there is a queue, meaning whether or not
QueueSettings.Enabled
is true. The problem is that whenQueueSettings.Enabled
is true, which enables parallelism (i.e., NumConsumers > 1), then the caller will not get the error from the downstream receiver and the caller has no way to block until the receiver returns.Here is where today's exporterhelper w/
QueueSettings.Enabled==false
will block until the call returns and return the caller's error: https://github.com/open-telemetry/opentelemetry-collector/blob/ebed8dd0943a98447d7f0e265cf58ec197bff93a/exporter/exporterhelper/queued_retry.go#L291Here is where, when
QueueSettings.Enabled==true
and the queue is full, the code will not block and returns a generic error immediately, instead. https://github.com/open-telemetry/opentelemetry-collector/blob/ebed8dd0943a98447d7f0e265cf58ec197bff93a/exporter/exporterhelper/queued_retry.go#L312Here is where, when
QueueSettings.Enabled==true
and the queue is not full, the code returns anil
without waiting to hear from the downstream receiver. https://github.com/open-telemetry/opentelemetry-collector/blob/ebed8dd0943a98447d7f0e265cf58ec197bff93a/exporter/exporterhelper/queued_retry.go#L316As it stands, the only way for an exporter to block on the downstream service and receive its responses synchronously is to disable queueing. When Queueing is enabled, there are two problems (a) no option to block on queue-full, (b) no option to wait on response.
Describe the solution you'd like Optional behavior that enables blocking on a queue full and waiting for a response.
Describe alternatives you've considered No alternatives come to mind.
Additional context The OTel-Arrow project has a developed a bridging-mode for OpenTelemetry Collectors that enables high-compression on the network link between two collectors. For this setup to work reliably and under high load, we need the
otelarrow
exporter to be configured with a blocking pipeline. We would recommend both of the optional behaviors described in this issue tootelarrow
users.