feat: Limit the number of concurrent go routines via SizedWaitGroup

jphuynh commented 3 years ago

@zackproser, @brikis98, I'm not sure if that's the best way to address it but I'm keen on starting the conversation on that.

I run git-xargs in all the repos of sizeable GitHub organisation (800+ total repos including 600+ active ones) and I run it both from my computer and GitHub actions.

The fact that we don't limit the number of go routines, I've encountered a few issues:

I quickly reach the limit of opened file descriptors
git-xargs will randomly failed to clone a subset of repositories with connection reset by peer errors

I've used github.com/remeh/sizedwaitgroup as a convenience to implement buffered channels here and also set a hardcoded limit of 100 which is completely finger in the air so I'm happy to discuss that more in details.

That change obviously slow things down but fixes my issues with the number of repos I run git-xargs on.

FYI running that branch on my organisation repos took: 639.32s

zackproser commented 3 years ago

Very interesting! Thanks again for sharing this feedback and the implementation - I'm curious if we should consider making this configurable? Perhaps for most folks the limit won't be needed, so we could by default not limit the concurrency unless you pass a flag that would be plumbed through the GitXargsConfig to this loop for determining how many goroutines could be used?

We'd also want to add tests, and new docs for any new flag. What are your thoughts on the above and would you be up for adding the tests, docs, and flags as well? If not, we'd also be happy to take it over, but will probably be a bit until we can get to it.

Thanks!

jphuynh commented 3 years ago

Yeah sure I can look into that. I kind of suspected that was the way to go but I was afraid of overloading the number of options.

jphuynh commented 3 years ago

Added the configuration flag and updated README.

The default value is set to 0 and ensure that with the absence of user input, this falls back to the existing behavior.

I'm not sure what's the best way to test that in addition to the existing tests though.

gruntwork-io / git-xargs

feat: Limit the number of concurrent go routines via SizedWaitGroup #38