This PR addresses the issue of S3 file race conditions where the last_modified timestamps of S3 files are being updated while extractions are in progress, causing the bookmark time to advance beyond the current execution time.
Resolution:
Record the sync_start_time at the beginning of the extraction.
Check if the file's last_modified is greater than the sync_start_time.
If so, store the sync_start_time as the bookmark in the state.
Otherwise, store the file's last_modified timestamp in the state file.
Description of change
This PR addresses the issue of S3 file race conditions where the last_modified timestamps of S3 files are being updated while extractions are in progress, causing the bookmark time to advance beyond the current execution time.
Resolution:
sync_start_time
at the beginning of the extraction.last_modified
is greater than thesync_start_time
.sync_start_time
as the bookmark in the state.last_modified
timestamp in the state file.Manual QA steps
Risks
Rollback steps