singer-io / tap-github

A Singer tap for extracting data from the GitHub API
GNU Affero General Public License v3.0
74 stars 88 forks source link

Suggestion: mark "starting repository" in bookmark data #46

Open sminnee opened 5 years ago

sminnee commented 5 years ago

If you had a very large number of repositories being fetched, say 50-100, it may be that you can never get through everything in an hour before you hit your 5000 request limit. I haven't hit this issue myself yet (well, I'd want to see where we get to with #43), so it's a little hypothetical, but I think it could become an issue in a future use-case (I'm likely to want to track about 80+ repos in my full implementation)

In such as a case, it would be absolutely fine to try again with the next fetch in an hour's time, but I would want to pick up where I left off.

So let's say I was tracking these 3 repos:

And in my import I got

It would be useful to mark "startingrepo = silverstripe/silverstripe-admin" in the bookmark data, and on that basis rotate use the following list order for my next run:

You can see that the items prior to silverstripe-admin have been cut from the top of the list and appended to the bottom.

What do you think? Would this be a straightforward way of improving reliability for large datasets

KAllan357 commented 5 years ago

This makes perfect sense. There's a good example in tap-shopify where we call it currently_syncing:

https://github.com/singer-io/tap-shopify/blob/master/tap_shopify/__init__.py#L122-L124

We'd happily accept this change.

sminnee commented 5 years ago

Just going to note that I've had issues with importing at least pull requests on this mammoth repo list. It's unable to get through the import of silverstripe/silverstripe-framework without choking.

bringyourownideas/silverstripe-maintenance bringyourownideas/silverstripe-composer-update-checker bringyourownideas/silverstripe-composer-security-checker silverstripe/cwp-agencyextensions silverstripe/cwp silverstripe/cwp-core silverstripe/cwp-installer silverstripe/cwp-pdfexport silverstripe/cwp-recipe-basic silverstripe/cwp-recipe-basic-dev silverstripe/cwp-recipe-blog silverstripe/cwp-recipe-cms silverstripe/cwp-recipe-core silverstripe/cwp-recipe-search silverstripe/cwp-search silverstripe/cwp-starter-theme silverstripe/cwp-watea-theme silverstripe/cwp-theme-default dnadesign/silverstripe-elemental dnadesign/silverstripe-elemental-subsites dnadesign/silverstripe-elemental-userforms lekoala/silverstripe-debugbar silverstripe/silverstripe-activedirectory silverstripe/silverstripe-admin silverstripe/silverstripe-akismet silverstripe/silverstripe-asset-admin silverstripe/silverstripe-assets silverstripe/silverstripe-auditor silverstripe/silverstripe-behat-extension silverstripe/silverstripe-blog silverstripe/silverstripe-campaign-admin silverstripe/silverstripe-cms silverstripe/comment-notifications silverstripe/silverstripe-comments silverstripe/silverstripe-config silverstripe/silverstripe-content-widget silverstripe/silverstripe-contentreview silverstripe/silverstripe-controllerpolicy silverstripe/silverstripe-crontask silverstripe/silverstripe-dms silverstripe/silverstripe-dms-cart silverstripe/silverstripe-documentconverter silverstripe/silverstripe-elemental-blocks silverstripe/silverstripe-elemental-bannerblock silverstripe/silverstripe-elemental-fileblock silverstripe/silverstripe-environmentcheck silverstripe/silverstripe-errorpage silverstripe/eslint-config silverstripe/silverstripe-externallinks silverstripe/silverstripe-framework silverstripe/silverstripe-fulltextsearch silverstripe/silverstripe-graphql silverstripe/silverstripe-graphql-devtools silverstripe/silverstripe-gridfieldqueuedexport silverstripe/silverstripe-html5 silverstripe/silverstripe-hybridsessions silverstripe/silverstripe-iframe silverstripe/silverstripe-installer silverstripe/silverstripe-ldap silverstripe/silverstripe-lumberjack silverstripe/silverstripe-mimevalidator silverstripe/silverstripe-postgresql silverstripe/silverstripe-realme silverstripe/recipe-authoring-tools silverstripe/recipe-blog silverstripe/recipe-cms silverstripe/recipe-collaboration silverstripe/recipe-content-blocks silverstripe/recipe-core silverstripe/recipe-form-building silverstripe/recipe-plugin silverstripe/recipe-reporting-tools silverstripe/recipe-services silverstripe/silverstripe-registry silverstripe/silverstripe-reports silverstripe/silverstripe-restfulserver silverstripe/silverstripe-secureassets silverstripe/silverstripe-securityreport silverstripe/silverstripe-segment-field silverstripe/silverstripe-selectupload silverstripe/silverstripe-sharedraftcontent silverstripe/silverstripe-siteconfig silverstripe/silverstripe-sitewidecontent-report silverstripe/silverstripe-spamprotection silverstripe/silverstripe-spellcheck silverstripe/silverstripe-sqlite3 silverstripe/sspak silverstripe/silverstripe-staticpublishqueue silverstripe/silverstripe-subsites silverstripe/silverstripe-tagfield silverstripe/silverstripe-taxonomy silverstripe/silverstripe-textextraction silverstripe/silverstripe-translatable silverstripe/silverstripe-upgrader silverstripe/silverstripe-userforms silverstripe/vendor-plugin silverstripe/silverstripe-versioned silverstripe/silverstripe-versioned-admin silverstripe/silverstripe-versionfeed silverstripe/silverstripe-widgets silverstripe/webpack-config silverstripe-themes/silverstripe-simple symbiote/silverstripe-advancedworkflow symbiote/silverstripe-gridfieldextensions symbiote/silverstripe-multivaluefield symbiote/silverstripe-queuedjobs symbiote/silverstripe-versionedfiles tractorcow/silverstripe-fluent undefinedoffset/sortablegridfield

UPDATE: This ticket would not resolve this underlying issue; I think the underlying issue is weakness in pull_request import which I will raise in a separate ticket. For now I've disabled pull_request import