Open MeltyBot opened 3 years ago
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen
label, or request that it be added.
Still relevant
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2436
Originally created by @rahul168 on 2020-11-09 03:20:43
Problem to solve
We need to download all objects from source using Tap, retrying on errors, before we start the incremental data flow.
Target audience
Data Engineers who are trying to download 100s objects and sometime the pipelines throws unexpected error during one of the object download after downloading many objects. A missing retry functionality makes it harder to achieve a clean full-refresh run before we run the incremental data flow.
Further details
We are trying to download many (100s) objects from Salesforce. Sometime object download fails due to server side issues. The tap-salesforce throws error and the full-refresh completes without downloading all objected. Obviously the state is saved so we can start again. But this second execution will again start downloading all data including the incremental data for already downloaded objects and full data for objects with error.
Proposal
Ideally a preferred solution would be to run the Meltano with retry setting to retry only failed objects and provide an ability to complete the full data download before we start incremental flow.
It would be even better if Meltano does retries before giving up with error on any object in the first full data download attempt itself (of course a configurable setting).
What does success look like, and how can we measure that?
We generally run the first run with switch
--full-refresh
and would like to configure the airflow to retry the Meltano flow if there is any failure but any failure with one object causes the Airflow to retry all (because of the--full-refresh
switch). It would be good if we can add another switch--retry-on-error
to retry the object download if there is any error. This will help us do a clean dull refresh before starting the incremental flow.Regression test
(Ensure the feature doesn't cause any regressions)