unitedstates / congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.
https://github.com/unitedstates/congress/wiki
Creative Commons Zero v1.0 Universal
912 stars 198 forks source link

Optimize govinfo Package Downloads with Concurrency #302

Closed elianaive closed 5 months ago

elianaive commented 6 months ago

This PR improves the performance of downloading packages from govinfo.gov by introducing the following optimizations:

Currently I've set it to 16 maximum workers. This can likely be increased, however I'm unsure how the govinfo server works.

These changes make the govinfo package downloader more efficient and resilient. Please review and provide any feedback or suggestions.

JoshData commented 6 months ago

This looks great. I just wonder what the right number of workers should be. Is there really a performance gain with that many workers over say, 4?

elianaive commented 6 months ago

This looks great. I just wonder what the right number of workers should be. Is there really a performance gain with that many workers over say, 4?

My understanding is that it pretty much would entirely depend on your own system/network and how much the server handles/allows. I could do some benchmarking if you'd like, but I don't think it would really show anything particularly representative. Of course I'm no expert on this, so I may be wrong.

JoshData commented 6 months ago

If you don't mind, please try at least a few other smaller values like 2, 4, and 8 and post some times. I really like to avoid merging things that don't have demonstrable impact.

elianaive commented 5 months ago

Execution time with 2 workers: 463.49 seconds Execution time with 4 workers: 465.14 seconds Execution time with 8 workers: 459.83 seconds

Embarrassingly, it actually doesn't seem to make a difference either way - I'm not entirely sure where I convinced myself it was making a difference while running it.

JoshData commented 5 months ago

This is my experience with concurrency it's always counter intuitive!