Closed JasonFengGit closed 3 months ago
Hi @jacobbieker, I've fixed the lint errors. However, I'm not sure if we should merge this into production before conducting more tests. I'm concerned about the potential memory overhead that might occur when running multiple downloads simultaneously.
I'll conduct some additional tests to see how it performs. Could you think of any scenarios where this could go wrong?
The data tailor only takes 3 jobs at a time, so potentially limiting it to a concurrency of 3 would be great, and keeps memory lower. Making it configurable is probably the best way forward for it.
I see! Thanks! I'll implement that.
Hi @jacobbieker, I've added a concurrency param to limit the parallel downloads.
One concern I have is that a new test I added for parallel downloads takes about 8 minutes on my machine. This increases the total test time (which is already lengthy) from approximately 30 minutes to about 40 minutes. I am not sure if I should replace the original test_data_tailor, though. Thanks!
Hi @jacobbieker, I updated the code (remove the non-parallel approach and test, concurrency defaulted to 1) and tested on my machine. Thx!
Hi @jacobbieker, speaking of testing, there is another reason for the long testing time: https://github.com/openclimatefix/Satip/issues/157#issuecomment-2023223214. Could you provide some suggestions for the implementation here? Thanks!
Hi, yeah, that sleep can probably be removed now, there can still be rate limiting, which that one is supposed to fix, but it seems to happen less often now. I would still leave the other sleeps that are around it which still would help mitigate hitting EUMETSAT's servers all at the same time
I see! Could you review this pr or approve this https://github.com/openclimatefix/Satip/pull/247 so I can make a seperate PR? Thanks!
Pull Request
Description
Trying to speed up download_tailored_datasets by adding an option to parallel downloads using
concurrent.futures
Fixes https://github.com/openclimatefix/Satip/issues/144
With parallelization:![image](https://github.com/openclimatefix/Satip/assets/40121574/53e0c454-3efd-44dd-8b0b-5c5e7102e929)
How Has This Been Tested?
test_data_tailor_parallel
, a new test I added in test_eumetsat.py. Parallelization reduces the test time from18:10
to07:41
.Checklist: