singer-io / tap-mambu

GNU Affero General Public License v3.0
14 stars 21 forks source link

Replication key on gl_journal_entries causes incomplete ledger #19

Open robindebois opened 3 years ago

robindebois commented 3 years ago

I am not 100% sure if the behaviour displayed by the tap is intentional or accidental; please feel free to discard issue is this was by design.

Expected behaviour

When loading journal entries with the tap I expected to (eventually) end up with all journal entries that are available in Mambu.

Actual behaviour

The tap does an incremental load with the booking_date as a replication key. However, there are multiple reasons why the choice of key leads to an incomplete ledger:

This also means for accounting purposes this data becomes useless, as the sum of all general ledger accounts doesn't sum to 0 anymore.

Steps to reproduce

How to fix:

As far as I can tell changing the replication key to creation_date will resolve it.

tomekzbrozek commented 3 years ago

@robindebois I just came across the same issue when working with Mambu GL Journal Entries endpoint and it seems to me that creation_date can't be used as a replication key because value of that replication key is eventually passed to API request parameters which are:

from | Start date - The first booking date you want to retrieve journal entries from. Required
-- | --
to | End date - The last booking date you want to retrieve journal entries from. Required

I'm not entirely sure, but I guess that booking date == entry_date which forces us to use this column as a replication key, otherwise we'd be comparing apples to oranges (i.e. passing creation_date to API request which is then translated into booking date by mambu API).

I came across this because I realized that every day, a couple of journal entries were not loaded into my database when using INCREMENTAL replication, exactly because the difference between their creation_date and entry_date is bigger than the width of my incremental replication strategy.

tomekzbrozek commented 3 years ago

@robindebois I can see that this tap uses neither creation_date or entry_date but booking_date: https://github.com/singer-io/tap-mambu/blob/master/tap_mambu/schema.py#L95-L99 which seems to be most relevant. Seems to me that the issue can be closed.

tomekzbrozek commented 3 years ago

I tried replicating gl_journal_entries data incrementally with the up-to-date mambu tap but it misses all journal entries which have booking_date more than 1 day earlier than creation_date. So the ledger is still incomplete.

The only solution I can think of for now is swapping replication method from INCREMENTAL to FULL.

robindebois commented 3 years ago

Hi Tom, The tap uses the v1 POST /gljournalentries:search endpoint (not the GET one, which you seem to suggest).. which in its body has the following filter criterium (see https://github.com/singer-io/tap-mambu/blob/master/tap_mambu/sync.py#L680):

{
                "filterConstraints": [
                    {
                        "filterSelection": "CREATION_DATE",
                        "filterElement": "BETWEEN",
                        "value": '{gl_journal_entries_from_dt_str}',
                        "secondValue": "{now_date_str}"
                    }
                ]
            }

As far as my ticket was concerned -- let's step over the intense confusion in which the Mambu documentation leaves me on value/booking and entry dates -- I think both booking_date and entry_date or value_date might not be the fields to replicate on incrementally.

Basically every search iteration, it should always get all the entries made between the most recently captured creation_date and now; and the reason is that all the other dates can be in the past, so occurring before the bookmark in the state file.