Transient server errors results in complete data loss.
Describe the solution you'd like:
Prevent the data loss by temporarily saving the data in persistent store so that it can be retried at a later time.
Describe alternatives you've considered.
There is an alternate option of retrying the data during transient errors by keeping it in memory.
This solution does not always results in successful data processing by the server and can still result in
data loss which is not a desirable state.
Additional Context
Here is the proposed set of steps for adding retry support via persistent storage.
Each step will be one or more PRs
This will be done in order to avoid adding external dependency.
All the APIs will be kept internal.
2) Add the retry mechanism via persistent storage.
The general idea is to have a background thread scan for any new items in storage and re-attempt to send it to the otlp endpoint.
Here is a basic pseudo code for the background thread: The APIs pertaining to persistent storage are explained here
while(_blobProvider.TryGetBlob(out var blob) && blob.TryLease(120000))
{
blob.TryRead(out var data);
var request = new OtlpCollector.ExportTraceServiceRequest();
request.MergeFrom(data);
try
{
// Try re-sending request
this.ExporClient.SendExportRequest(request);
// Delete the blob from storage if the request was successful
blob.TryDelete();
}
catch
{
// check if the error is retryable based on the exception/response codes
// if yes then do not delete the blob else delete it from Storage.
}
}
3) Wire up individual exporters to allow retries via persistent storage. Psuedo code:
public ExportResult Export(...)
{
try
{
// Do the export
}
catch (RpcException ex)
{
// check if error is retryable based on exception/response codes
if (IsRetryable(ex, out retryAfterTime))
{
// save the request to persistent storage which will be retried later by background thread
_blobProvider.TryCreateBlob(request.ToByteArray(), retryAfterTime);
}
else
{
//log exception
}
}
}
NOTE - This will be only enabled via experimental feature flag at the moment to allow feedback
from users.
None of the above steps involve adding new public APIs or external dependencies. More details
will be included in the individual PRs.
This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.
Feature Request
Is your feature request related to a problem?
Transient server errors results in complete data loss.
Describe the solution you'd like:
Prevent the data loss by temporarily saving the data in persistent store so that it can be retried at a later time.
Describe alternatives you've considered.
There is an alternate option of retrying the data during transient errors by keeping it in memory. This solution does not always results in successful data processing by the server and can still result in data loss which is not a desirable state.
Additional Context
Here is the proposed set of steps for adding retry support via persistent storage. Each step will be one or more PRs
1) Fork in OpenTelemetry.PersistentStorage.Abstractions and OpenTelemetry.PersistentStorage.FileSystem into the otlp exporter package
2) Add the retry mechanism via persistent storage.
The general idea is to have a background thread scan for any new items in storage and re-attempt to send it to the otlp endpoint. Here is a basic pseudo code for the background thread: The APIs pertaining to persistent storage are explained here
3) Wire up individual exporters to allow retries via persistent storage. Psuedo code:
NOTE - This will be only enabled via experimental feature flag at the moment to allow feedback from users.
None of the above steps involve adding new public APIs or external dependencies. More details will be included in the individual PRs.
Reference issue: https://github.com/open-telemetry/opentelemetry-dotnet/issues/1278