riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

fixed h5py file saving #148

Closed kaechb closed 1 year ago

kaechb commented 1 year ago

As the dump() method for h5 files is as of now it does not save anything. I added a few lines of code to fix it. The function call needs to have the 2 kwargs: data and name, where data is the numpy array to save and where name is the name in the .h5 E.g.:

LocalFileTarget(path="/bla.hdf5",fs="foo").dump(name="test",data=array)
riga commented 1 year ago

Hi @kaechb , thanks for opening an issue :)

It's not explicitly mentioned, but the current implementation is intended to encourage context usage, but indeed, allowing to directly pass a dataset could be useful. I'll check tomorrow if passing "name" and "data" is generic enough (I'm not a h5py expert).

In the meantime, could you try the following?

with target.dump(formatter="h5py") as f:
    f["my_dataset"] = list(range(10))
    # or
    f.create_dataset(name="my_dataset", data=list(range(10)))
kaechb commented 1 year ago

Ah that is a much cleaner and general solution - I've run into some problems where apparently there can be problems when you open and close h5 too frequently and as such I had to change the formatter. Many thanks for the quick reply and Cheers, Benno

riga commented 1 year ago

Sure, and thanks again for reporting this. I'm in the process of adding (api) documentation and this will be indeed added there :+1: