ska-sa / katdal

Data access library for the MeerKAT radio telescope
BSD 3-Clause "New" or "Revised" License
11 stars 13 forks source link

Dump of hdf5 files on disk #357

Open gigjozsa opened 1 year ago

gigjozsa commented 1 year ago

I haven't found a method yet to dump an hdf5 file as read onto a local disk. So, read a file from the archive with katdal.open, then dump it on the disk as is, to then read it again with katdal.open . If you have a local copy, this makes things much faster if you have to repeat them. If there is such method, I'd appreciate a hint, if not, it might be good to implement it.

ludwigschwardt commented 1 year ago

Hi Josh, you can have a look at the mvf_copy.py script in the scripts directory. This also allows some rudimentary filtering of the data to avoid copying the data you don't want (I'm still busy expanding the filtering options).

One downside of the script is that it cannot continue with a partial copy after a crash, unlike mvftoms and wget / curl, as illustrated in the diagram below:

archive_downloads

Another option is rclone. I've used this on our own cluster machines with good success but I still have to figure out a suitable formula when using token authentication for external access.

Also, be aware that you don't get a single HDF5 file like KAT-7 produced, but a directory with hundreds (or thousands) of NPY files, as well as an RDB file as point of entry. This is our chunked "MVF4" format.

I'll see if I can get rclone to work, and improve mvf_copy.py as well in the meantime.