src-d / ghsync

GitHub API v3 > PostgreSQL
https://sourced.tech
GNU General Public License v3.0
9 stars 8 forks source link

Deal with potential pagination problems #42

Open carlosms opened 5 years ago

carlosms commented 5 years ago

There can be problems importing paginated resources if some of them are deleted while paginating. Storing all of them on memory could be also expensive.

https://github.com/src-d/ghsync/pull/32#discussion_r295254312

Imagine you have 101 repos. GET the first page, and start processing one by one their issues and PRs. Meanwhile a repo gets deleted, and the total number of repos is 100. After a while we finish processing all the repos in the first page, and we GET page 2. But now github will say that there is no page 2, and we missed the processing of 1 repo.

https://github.com/src-d/ghsync/pull/32#discussion_r295309626

[...] we still get error anyway if something got deleted during import.

https://github.com/src-d/ghsync/pull/32#discussion_r295220762

[...] this may impact on memory consumption, [...] it would be better to avoid storing the repos in an array and just call s.doRepo for each repo for each page.