Closed Lieselotte12 closed 2 weeks ago
We have seen some issues using h5py and fsspec (maybe in combination with h5netcdf). See https://github.com/h5py/h5py/issues/2019 and linked threads. That gets into the weeds, but the summary is that it's challenging to read NetCDF files over the network reliably.
Is it also possibly to parallelise the download
Yes, you can use something like concurrent.futures
or dask or some other parallel programming library to do the data access in parallel if needed.
Closing, feel free to re-open if you need more info
Hello everyone,
I'm using the Planetary Computer archive to get access to Sentinel-5P data for my master thesis. I want to clip the data to a specific bounding box, filter the data afterwards and calculate some values. The functions I wrote for it work fine (I tried it for several datasets), but I have a large period of time to cover and a lot of area of interests. Therefore, I implemented a for-loop to loop through each day of my time period. At one point, the script locks up at the opening of the next data (see function clip_dataset() below), so it doesn't print "Start" anymore (sometimes it can open the dataset from the day, sometimes it locks up). One possibility could be that it doesn't download all the data and can't do the other steps afterwards, so I want to check the checksum of the dataset. Is there a possibility to get it from the URL and how (e. g. with md5)? Is it also possibly to parallelise the download due to my huge amount of data I need for my thesis? Or doesn't the API allow it?
Thanks in advance!