Fixes read mode when sincedb already stores a reference for the file not completely consumed.
What does this PR do?
Update the file pointer of a read mode file to the max between the read bytes or the sincedb reference for the same file.
This solves a problem, that when a pipeline is restarted, it's able to recover from the last known reference, without restarting from the beginning, and reprocessing already processed lines.
Why is it important/What is the impact to the user?
When a pipeline with file input in read mode is restarted, this let the plugin to recover from where it left if that information is present in the sincedb store.
Checklist
[x] My code follows the style guidelines of this project
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] I have made corresponding change to the default configuration files (and/or docker env variables)
[x] I have added tests that prove my fix is effective or that my feature works
Author's Checklist
[x] verify with the steps used in the bug report #240. I used the following test file:
sample_fixture.csv.txt
The expectation is to have 2 buckets, equally sized. Without the fix a bucket contains more documents, which means some rows was reprocessed on a pipeline reload.
Release notes
Fixes read mode when sincedb already stores a reference for the file not completely consumed.
What does this PR do?
Update the file pointer of a read mode file to the max between the read bytes or the sincedb reference for the same file. This solves a problem, that when a pipeline is restarted, it's able to recover from the last known reference, without restarting from the beginning, and reprocessing already processed lines.
Why is it important/What is the impact to the user?
When a pipeline with file input in read mode is restarted, this let the plugin to recover from where it left if that information is present in the sincedb store.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
Pipeline definition:
Some curls to configure the ES output index and an aggregation query to verify:
The expectation is to have 2 buckets, equally sized. Without the fix a bucket contains more documents, which means some rows was reprocessed on a pipeline reload.
How to test this PR locally
Follow step steps in #290
Related issues
Use cases
Screenshots
Logs