src-d / ghsync

GitHub API v3 > PostgreSQL
https://sourced.tech
GNU General Public License v3.0
9 stars 8 forks source link

Use batched transactions #39

Open smacker opened 5 years ago

smacker commented 5 years ago

I'm not sure it's a good idea to wrap ALL the issues in a transaction. There can be thousands of them so they won't be committed for quite a long time. (on wip branch for prettier/prettier it was taking minutes to download all of them) So when UI is open db is still empty and charts are ugly showing nulls. I would better commit in batches by 100 for example. Though maybe I'm missing some other case when batches can cause problems.

carlosms commented 5 years ago

On the other hand, committing all of them allows to have more "valid" charts. You either have the issues for a repo or you don't this way you will not see partial results that may cause a bad impression if they do not match what is expected.

Both approaches are valid in my opinion, I don't have a strong preference.

smacker commented 5 years ago

Agree. I don't see any reason to implement it right now.

Let's test and get some feedback on how bad is it to have a working instance without any data. According to my current experience, it is worse than partial data. But we might be able to improve charts somehow or do some other stuff to mitigate the problem.