Audit Logs - Githubissues

Is your feature request related to a problem? Please describe. For pull based sources that perform bulk reading like S3 scan or the OpenSearch source that is in PR. As a user, I would like a mechanism to track which data has been read and processed. This could include if data is dropped, a node in my data prepper cluster becomes unresponsive

Describe the solution you'd like An audit log comes to mind. This log would contain a list of data processing events related to docs or indices or some metadata determine by the source. These logs could be used to determine the exact time frame a set of data was pulled into data prepper.

Describe alternatives you've considered (Optional)

Metrics tracking the completion percentage for a scan
Improving existing logs by adding an Audit tag to the message which tracks relevant data processing events.
Not including audit logs. I am not sure if this makes sense in Data Prepper. Audit logging would be a new requirement that we may have to enforce on every plugin if we wanted to track data through a pipeline.

opensearch-project / data-prepper

Audit Logs #2705