Closed elianaive closed 5 months ago
This looks great. I just wonder what the right number of workers should be. Is there really a performance gain with that many workers over say, 4?
This looks great. I just wonder what the right number of workers should be. Is there really a performance gain with that many workers over say, 4?
My understanding is that it pretty much would entirely depend on your own system/network and how much the server handles/allows. I could do some benchmarking if you'd like, but I don't think it would really show anything particularly representative. Of course I'm no expert on this, so I may be wrong.
If you don't mind, please try at least a few other smaller values like 2, 4, and 8 and post some times. I really like to avoid merging things that don't have demonstrable impact.
Execution time with 2 workers: 463.49 seconds Execution time with 4 workers: 465.14 seconds Execution time with 8 workers: 459.83 seconds
Embarrassingly, it actually doesn't seem to make a difference either way - I'm not entirely sure where I convinced myself it was making a difference while running it.
This is my experience with concurrency it's always counter intuitive!
This PR improves the performance of downloading packages from govinfo.gov by introducing the following optimizations:
Currently I've set it to 16 maximum workers. This can likely be increased, however I'm unsure how the govinfo server works.
These changes make the govinfo package downloader more efficient and resilient. Please review and provide any feedback or suggestions.