Open mic-max opened 1 year ago
For retry, we need to consider the following cases:
Confirming the end result here:
- OTLP Exporter Options: Opt-in.
All of the work planned for this issue stays as opt-in, correct?
Confirming the end result here:
- OTLP Exporter Options: Opt-in.
All of the work planned for this issue stays as opt-in, correct?
That is correct. This will be an opt-in feature until we have spec.
This will be an opt-in feature until we have spec.
Is there related spec work for these options in-progress already?
If a retryable error is returned. Log the error (info level). Create a blob.
A retry policy is useful independently from persistent storage. That is, the gRPC client (or Polly if using HTTP) could be configured with a retry policy which can handle transient network errors. This handling would be opaque to the OTLP exporter.
Do you plan to implement retry w/o also requiring the use of persistent storage?
Do you plan to implement retry w/o also requiring the use of persistent storage?
We should do this and is required by spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#retry
Persistent storage is an optional, opt-in feature.
Just wanted to share a thought. @alanwest kind of scratched at this on the SIG yesterday, I think. What does "persistent storage" mean? In the scope of this work, it seems to me we persist the data for retries. I'm working on an exporter for some client teams some (maybe all) of which seem to want "persistent storage" but of the always & up-front variety. Meaning exporter just writes to disk and then some other thread tries to ship off the disk data on its own cadence.
Only sharing so we can be clear what kind of "persistent storage" we aim to support in OTLP and make sure the documentation is also clear 😄
Are there any plans still in motion regarding some kind of persistence support? Once OpenTelemetry.PersistentStorage
is released is the intention to integrate it into the otlp exporter or create some persisted otlp exporter?
https://opentelemetry.io/docs/specs/otel/library-guidelines/#protocol-exporters
if an application’s telemetry data must be delivered to a remote backend that has no guaranteed availability the end user may choose to use a persistent local queue and an Exporter to retry sending on failures.
This is more or less the scenario I would like to cater for, similar to what @CodeBlanch mentioned. Similar to how the collector has this supported https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#persistent-queue .
Right now the issue that I experience is if create my own BatchExportProcessor
using a persisted local queue and try re-use the OtlpTraceExporter.Export
function, I then need to serialize/deserialize System.Diagnostics.Activity
to store it on some persistent queue somewhere, serializing the Activity
object does not really seem like a viable option. The only other option I am left with, unless I am missing something, is to implement the entire OTLP exporter myself and do the mapping to the proto types and store the serialization of that format on my local file queue.
Am I correct that there is no retry by default? We would need to inject a retry policy by ourselves in the HttpClientFactory (if using Http) with e.g. Polly? If so we would need to catch the error codes described in the spec? tks
Am I correct that there is no retry by default? We would need to inject a retry policy by ourselves in the HttpClientFactory (if using Http) with e.g. Polly? If so we would need to catch the error codes described in the spec? tks
True. See https://github.com/open-telemetry/opentelemetry-dotnet/issues/1779 . Some PRs are in-flight now to make this happen automatically, so you don't have to manually deal with it.
Feature Request
Is your feature request related to a problem?
When exporting data and a transient server issue prevents the request from being processed correctly the data will be lost. Or when a program is shutdown any data not yet exported before the process is terminated will be lost.
Describe the solution you'd like:
The data should be attempted to be exported again when the error is considered repeatable. On program shutdown data yet to be exported should attempt to do so after first saving to disk in case the transmission fails or does not have enough time to complete. Upon the next program execution the saved to disk telemetry will attempt to export. This will reduce the amount lost telemetry.
Additional Context
Add the ability to OTLP exporters to retry exports that fail in a defined way. This includes between program shutdowns by persisting the data to disk upon failure. This will help improve the reliability of OTel from the client's end.
Original GitHub Issue: https://github.com/open-telemetry/opentelemetry-dotnet/issues/1278
The first set of PRs will focus on a single to be decided section in the following matrix and follow-up PRs will be enabling the others, reusing as much code as reasonable.
src/OpenTelemetry.Exporter.OpenTelemetryProtocol
PR Roadmap
FileBlobProvider
:Storage folder
section of https://github.com/open-telemetry/opentelemetry-dotnet/issues/1278persistentBlobProvider.TryCreateBlob(data, RetryAfter.Milliseconds, out var blob);
foreach (var blobItem in persistentBlobProvider.GetBlobs()) { ... }
blob.TryLease(1000); blob.TryRead(out var data);
blob.TryDelete();
FileBlobProvider
and the aboveRetry
scenario.Guard
s toFileBlobProvider
interface. RefRetryable Errors
Testing Strategy
Make use of the
test/OpenTelemetry.Exporter.OpenTelemetryProtocol.Tests/MockCollectorIntegrationTests.cs
class. Some example can be seen in this closed PR. Which the changes to that file made in that PR should be reusable.References
PersistentStorage
APIRetry
section