singer-io / tap-freshdesk

GNU Affero General Public License v3.0
15 stars 30 forks source link

Error iterating pages #29

Open mcouto-sossego opened 5 years ago

mcouto-sossego commented 5 years ago

When iterating pages, some tickets may be missing due to dynamic nature of freshdesk dataset.

Since each page call may be separated several minutes from last page, tickets may be updated by agents and results may differ.

Example:

page1 = ticket1, ticket2, ticket3, ..., ticket 98, ticket99, ticket100 (wait 3 minutes) page2 = ticket103, ticket104, ticket105, ...

In the above example, tickets 101, 102 and 103 are missed if 3 random tickets from 1 to 100 are updated in the proposed 3 minutes window.

Resolution: all pages data must be downloaded at once, before iterating in conversations and other domains.

The same apply to conversation (paged data), etc.

Source: tap_freshdesk/init.py Function: gen_request Proposition: make all requests in the "while loop", without any "yield", just append data to a temp var. Only yield rows from the temp var after the loop.

KAllan357 commented 5 years ago

I don't think this strategy would work so well in practice due to the memory usage pattern this proposal would impose.

Is there an alternative way to query this data using a min / max combination? A feature like that would allow us to impose a "window" on the data we paginate and only move the window after the iteration has completed.

mcouto-sossego commented 5 years ago

There is no option like that on Freshdesk API ( https://developers.freshdesk.com/api/#list_all_tickets )

We are using tap-freshdesk, and by debugging logs we detect that 3-10 tickets are missing on each 100 tickets single page. It is about 7% failure on an ETL proccess (acceptable must be zero).

luandy64 commented 5 years ago

@mcouto-sossego Are you able to make a PR with your idea?

dpnsh commented 4 years ago

@mcouto-sossego were you able to find any workaround for this ?

We are also using tap-freshdesk with stichdata in our production and this behaviour (missing tickets while iterating pages) is significantly impacting the system.