solo-spice / sospice

Python data analysis tools for the SPICE extreme-UV spectrometer on Solar Orbiter
BSD 3-Clause "New" or "Revised" License
12 stars 5 forks source link

Download all files from a Catalog at once #48

Open ebuchlin opened 5 months ago

ebuchlin commented 5 months ago

Downloading files from a Catalog currently involves downloading each file separately or adding each file to a Downloader before downloading them.

from parfive import Downloader
downloader = Downloader()
result.iloc[:10].apply(
   lambda row: FileMetadata(row).download_file(
      "/tmp/spice-files",  # base directory
      release="2.0",
      downloader=downloader
   ),
   axis=1
)
downloader.download()

This could be a method of the Catalog object.

safimuhammad commented 3 months ago

hello, I'd like to work on this issue, would you explain the requirements and details of it?

ebuchlin commented 3 months ago

Yes, if you would like to.

The logic should be exactly the same as in FileMetadata.download_file(), but it should work on all files of a Catalog.

The function could be named Catalog.download_files() (plural) if that's not too confusing.

safimuhammad commented 2 months ago

thanks for the details will start working on it, I'll let you know if I need more info.

safimuhammad commented 2 months ago

one more thing, should the method download the entire catalog or should it ask for a range of files to download like in earlier examples?

p.s: my current implementation downloads the entire Catalog which I think is not what you intended also downloading the entire Catalog takes a lot of time due to num of files.

ebuchlin commented 2 months ago

It is true that downloading the full catalog of a data release (currently 1.5TB for data release 4.0) would take an enormous amount of time (and resources).

But I think the logic should still be that most of the time all the data corresponding to a full Catalog object should be downloaded, as this is simpler. The expectation is that the user does this only after downselecting the catalog to a much smaller catalog than the full data release.

However there could be a max_download parameter with a sufficiently large default (I would say at least 1000 files) that can be overridden by the user; with a warning if we hit the limit.

safimuhammad commented 2 months ago

Makes sense, implemented it now opening up pull request.