open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.27k stars 769 forks source link

Add retry support to otlp exporter via persistent storage #4791

Open vishweshbankwar opened 1 year ago

vishweshbankwar commented 1 year ago

Feature Request

Is your feature request related to a problem?

Transient server errors results in complete data loss.

Describe the solution you'd like:

Prevent the data loss by temporarily saving the data in persistent store so that it can be retried at a later time.

Describe alternatives you've considered.

There is an alternate option of retrying the data during transient errors by keeping it in memory. This solution does not always results in successful data processing by the server and can still result in data loss which is not a desirable state.

Additional Context

Here is the proposed set of steps for adding retry support via persistent storage. Each step will be one or more PRs

1) Fork in OpenTelemetry.PersistentStorage.Abstractions and OpenTelemetry.PersistentStorage.FileSystem into the otlp exporter package

2) Add the retry mechanism via persistent storage.

The general idea is to have a background thread scan for any new items in storage and re-attempt to send it to the otlp endpoint. Here is a basic pseudo code for the background thread: The APIs pertaining to persistent storage are explained here

while(_blobProvider.TryGetBlob(out var blob) && blob.TryLease(120000))
{
    blob.TryRead(out var data);
    var request = new  OtlpCollector.ExportTraceServiceRequest();
    request.MergeFrom(data);

    try
    {
    // Try re-sending request
        this.ExporClient.SendExportRequest(request);

        // Delete the blob from storage if the request was successful
        blob.TryDelete();
    }
    catch
    {
        // check if the error is retryable based on the exception/response codes
        // if yes then do not delete the blob else delete it from Storage.
    }
}

3) Wire up individual exporters to allow retries via persistent storage. Psuedo code:

public ExportResult Export(...)
{
    try
    {
        // Do the export
    }
    catch (RpcException ex)
    {
        // check if error is retryable based on exception/response codes
        if (IsRetryable(ex, out retryAfterTime))
        {
            // save the request to persistent storage which will be retried later by background thread
            _blobProvider.TryCreateBlob(request.ToByteArray(), retryAfterTime);
        }
        else
        {
            //log exception
        }
    }
}

NOTE - This will be only enabled via experimental feature flag at the moment to allow feedback from users.

None of the above steps involve adding new public APIs or external dependencies. More details will be included in the individual PRs.

Reference issue: https://github.com/open-telemetry/opentelemetry-dotnet/issues/1278

github-actions[bot] commented 1 month ago

This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.