holmari / gerritstats

Tool for creating statistics from a Gerrit repository
MIT License
195 stars 54 forks source link

Refactoring of Downloaders to add incremental data file update #60

Open Ziver opened 4 years ago

Ziver commented 4 years ago

This is a bit of a refactoring to the downloader to change it to a Iterator like model where the data is incrementally written to the output file. The reason for this is that on larger repos it can take hours to download all the commits and if something fails or you need to abort the downloader you would previously loose all the already downloaded data, this should now be better as the last downloaded chunk should already have been written to the output file.

I have not checked but this should probably also improve memory usage as all commits do not need to be in memory until the downloader is done downloading.

I also removed the extra layer inside Ssh downloader and made the Legacy downloader just another independent downloader class, this was just to simplify the structure, was a bit hard to keep track of the layers when coming in to the repo.

My main goal with this is to add a diff-only option to the downloader so that the downloader only downloads new commits from Gerrit so the existing output file can be incrementally updated periodically or that you can continue downloading after a failed download without the need to re-download all commits again.

holmari commented 3 years ago

@Ziver I'm really sorry for never getting back to you. I love Gerrit but unfortunately I don't use it at work anymore, and so I have not been able to maintain this tool. I really appreciate your changes here but I can't merge them in since I can't actively maintain this tool. I updated the README with a note regarding a rewrite of this tool, which I pushed to GitHub yesterday.

@Ziver, in case you're interested, the Gerrit implementation of data fetching + analysis computation is up for grabs and shouldn't be tons of work - the GitHub data download code is about 400 lines, and the GitHub-specific analysis is about 600. Let me know, I'd be happy to walk you through the code in case you were curious to pick it up :-)