singer-io / tap-pipedrive

A Singer.io tap for extracting data from the Pipedrive API
GNU Affero General Public License v3.0
13 stars 34 forks source link

Incremental replication repeatedly emits last record #108

Closed JohannesRudolph closed 1 year ago

JohannesRudolph commented 2 years ago

I'm running into an issue running repeated incremental loads with this tap, e.g. for the dealflow stream. Every run of the tap will duplicate the latest record in a stream (assuming there are no newer records generated in pipedrive in between).

For example see this, where the exact same dealflow id entry got extracted and loaded into my target repeatedly. image

The culprit for this behavior is that the comparison used is "newer or equal" or >= and not a strict "newer" or >:

https://github.com/singer-io/tap-pipedrive/blob/0a7d6cea9b852d8a33bef83443cc298e5ca9e0fa/tap_pipedrive/stream.py#L108-L129

This means I have to filter these duplicate records in later stages of my pipeline, which is possible but a bit of a chore.

dsprayberry commented 1 year ago

This is expected functionality across Singer taps using incremental replication to ensure data quality despite potential race conditions.