Support adding datasets to a container that has been loaded from a file

ssiegelx commented 1 month ago

Currently the draco.core.ContainerBase.add_dataset method fails if the container has been instantiated with the from_file method.

The reason for the failure is because the allow_chunked attribute does not exist. The allow_chunked attribute is set in the ContainerBase.__init__ method, which the from_file method explicitly avoids calling (see here).

If allow_chunked = True (default) then the add_dataset method uses the chunk options specified in a containers _dataset_spec attribute. If it's False, then the chunk size is not specified. As far as I can tell, nowhere in the code base is a container instantiated with allow_chunked = False.

If I make allow_chunked a class attribute that defaults to True, then loading the data from a file and then adding a new dataset appears to work fine.

I'm not very familiar with the inner workings of caput.memh5. Are you aware of any subtle issues with adding datasets to a container that has been loaded from disk?

If not, how should the allow_chunked attribute be set?

tristpinsm commented 1 month ago

IIRC allow_chunked only exists so that you can disable compression as it was not supported in some instances. So I think it not being created when the container is instantiated through from_file is just an oversight, and making it a class attribute should be harmless.

I don't know of any subtle issues when adding datasets to a container loaded from disk. My understanding is that once it is in memory it is just like any other container. If it is not in memory (ondisk=True) then things may be different but I'm not sure how it behaves.

ljgray commented 1 month ago

I agree with Tristan. I don't anticipate any issues with making allow_chunked a class attribute.

radiocosmology / draco

Support adding datasets to a container that has been loaded from a file #290