Closed CodeBlanch closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 86.34%. Comparing base (
6250307
) to head (5dbdced
). Report is 285 commits behind head on main.
@reyang I made some changes to the design after our initial offline discussion. Please take a look when you have a moment and let me know how you feel about it now.
Renamed from WithProcessor
-> Transform
Switched the design from executing the transformation as items were read to doing all the work upfront. I liked the previous design better but my concern was it could lead to a case where the batch returns a count which differs from the number of items which would actually be read out (in the case of filtering). Probably wouldn't be an issue for most exporters, but it could be an issue for something so I went with a safe-by-default design.
In order to do the work upfront I utilized the ArrayPool
to store the transformed/filtered batch. The rented array has to be returned when the batch processing is done. We could have relied on users calling Dispose
to do that, but I felt most users would miss that. In order to make it easy & safe I switched the design from returning a new batch to mutating the existing batch. This way users performing transformation/filtering don't need to worry about disposing anything, the batch processor which created it initially will handle that.
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.
@CodeBlanch, the feature looks good, but after short verification with https://github.com/open-telemetry/opentelemetry-specification/tree/v1.35.0 there is no such transformer. I think that it should be defined on the specification level, then we could implement it. For now, it can be hidden by the unstable features.
@pellared, FYI as you working on similar API for the Go SDK.
FYI as you working on similar API for the Go SDK.
This looks like violation of the SDK specification which does not handle filtering. CC @open-telemetry/technical-committee
@pellared I hear you, but I don't think it is that cut and dry. This issue is caused by implementation details in .NET. Do other languages have a batch implementation based on a circular buffer backed by records from a pool? 😄 I looked at Java a bit. It seems the batch there is a simple ArrayList or Collection. If users want to massage that batch and then delegate the processing work to some other exporter they may filter, transform, or augment the items, no? This will not work as expected in .NET due to the implementation. The spec doesn't really say much at all about what shape the batch should have. Do we need it to be more opinionated? IMO the situation is we made some perf optimizations which lead to surprising behavior in code which would work with other SDKs. This API is to give users a way to do it safely. IMO it is well within our purview to do that as it is to specialize our implementation(s) in the first place.
PS: Just to be clear this isn't a hypothetical exercise. There are users doing this today and they are currently experiencing data loss 😢
Go batching log processor has also a circular buffer. I will do my best to take a look later and compare the implementations. If you want you can look as well: https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/log/batch.go and https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/log/exporter.go
then delegate the processing work to some other exporter they may filter, transform, or augment the items, no?
These are scenarios not supported by the specification (sic!) and probably by many SDKs as well. A little reference https://github.com/open-telemetry/opentelemetry-specification/pull/4067#issuecomment-2221593582
This API is to give users a way to do it safely. IMO it is well within our purview to do that as it is to specialize our implementation(s) in the first place.
May there other ways on how to do it?
The biggest concern is that you are not only adding "transform" capabilities but also "filtering" which is not covered by the specification. Why wouldn't you like to filter the records before they reach the batching processor?
I assume that adding a "transform" (without filtering capabilities) to the batching exporter will be totally acceptable given the implementation specifics (as you are using a pool).
@pellared
Why wouldn't you like to filter the records before they reach the batching processor?
I kind of touched on this in the description:
Customer specifically does this over the batch at the time of export to not block the application when it is creating the telemetry.
But let me expand on it.
Users could perform transformation via a processor before things hit the batch. This is actually way easier than filtering to do 🤣 But that is done synchronously before the "Log" call returns. In this case the transformation is complex. Lots of scanning and regex execution and who-knows-what-else. The user decided to do this over the batch instead because in the case of batch export processor this is done lazily on the dedicated thread. The idea is to defer the expensive processing so the impact is not felt every time a log is emitted. Perhaps the spec is a bit naive in its prescription of where this type of thing must be done?
I assume that adding a "transform" (without filtering capabilities) to the batching exporter will be totally acceptable given the implementation specifics (as you are using a pool).
This particular user actually does not need filtering. They just need transformation. I added the filtering bit. Because, why not? 😄 In my mind expensive filtering can also benefit from being performed lazily against the batch. If we are to add this API it felt natural to also let it filter. If we come back later and need a separate API to filter that makes it (potentially) two calls and some awkwardness in that we would have a transform API and a filter API but the filter API can always transform given LogRecord
is mutable. I would be willing to drop filtering but I would prefer to be pragmatic (help users accomplish their scenarios) vs dogmatic (not have it just because the spec doesn't say we must/may/should).
May there other ways on how to do it?
Thinking out of the box. We could expose LogRecord.AddReference. If correctly called that would prevent the LogRecord
s from going back to the pool when taken from one batch and put into another. But I do feel that is a worse design. It isn't super friendly and leaks implementation details. Currently none of the pooling is exposed.
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
Changes
Transform
API onBatch<T>
which can be used to modify and/or filter records contained in a batchDetails
Ran into an issue with a customer doing this (more or less):
ExecuteRedactionEngine
is a bunch of logic for inspecting the log record and removing sensitive information. Customer specifically does this over the batch at the time of export to not block the application when it is creating the telemetry.The issue here is while processing
foreach (var logRecord in batch)
the completedLogRecord
s may be returned to the pool (if they were rented from it).What this PR is proposing to solve this is add a new API on
Batch<T>
which can be used to do this processing safely by mutating an instance ofBatch<T>
:Merge requirement checklist
CHANGELOG.md
files updated for non-trivial changes