open-telemetry / otel-arrow

Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow
Apache License 2.0
68 stars 13 forks source link

[otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring #205

Closed jmacd closed 3 months ago

jmacd commented 3 months ago

Restructure receiver code to improve readability. There are a number of metrics that are incremented when a batch starts being processed and are decremented when the batch is finished, but the control flow that maintained the balance of these updates was convoluted.

The root-cause of #204 is that Arrow batches meant for a consumer to be processed in order were processed out-of-order. There was a large function body which served two purposes: consume Arrow data of the appropriate kind, enter data for the pipeline to consume next. This had to be split into two parts and should have been done as part of #181. (I, as reviewer, missed this and find, in hindsight, that the code is not easy to follow.)

This improves the code structure by moving all stateful aspects of starting/finishing a request into a new inFlightData object which has a deferrable method to finish the request. Here, we keep:

  1. The inFlightWG done count
  2. The active requests metric
  3. The active items metric
  4. The active bytes metric
  5. The bytes-acquired from the semaphore
  6. A per-request span covering Arrow decode
  7. Netstat-related instrumentation

Authorization now happens before acquiring from the semaphore.

A number of fmt.Errorf() calls are replaced with status.Errorf(...) and a specific error code. The tests are updated to be more specific. Several Arrow tests were accidentally canceling the test before an expected error condition was actually tested, they have been audited and improved.

One new concurrent-receiver test was added.

Fixes #204.

jmacd commented 3 months ago

@kristinapathak Proposing to add you as a reviewer in this repository. This change is a substantial rewrite but not a substantial change of behavior, so it's worth a careful review and should help you learn the receiver code very well. Thank you! 😁 🚀

jmacd commented 3 months ago

Reviewers, please take another look.

jmacd commented 3 months ago

For the test flake above, https://github.com/open-telemetry/otel-arrow/issues/206