Open rly opened 1 year ago
So you'd have to completely setup all of the central structure of the file prior to actually writing the data into the datasets? So that means you have an analyze which datasets being written have been wrapped in a DataChunkIterator
, write all the other datasets whose data is passed explicitly already in memory, then enable nwbfile.swmr=True
just to prevent unintentional corruption during the final step of exhausting the DCI queue?
To be fair I have seen cases like that before, but I've also seen errors occur during object construction so I don't think this mode could prevent those types of errors. That said, it seems easy to add and as long as the line is put in the right place (after all group/dataset instantiation) it doesn't seem like it would hurt. I don't know that we'd actually have to bother about doing the flush
stuff since we're only after the increase in robustness to error handling causing file corruption.
Would the line right after https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L774 be a good place to flip that flag?
Would the line right after https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L774 be a good place to flip that flag?
Yes, I believe that would be the right place to set this.
I've also seen errors occur during object construction so I don't think this mode could prevent those types of errors
I agree, however, object mapping happens before the actual write during the build process. That is, errors that happen in the object mapping mean that the HDF5 file doesn't get modified at all.
write all the other datasets whose data is passed explicitly already in memory, then enable
nwbfile.swmr=True
just to prevent unintentional corruption during the final step of exhausting the DCI queue?
Yes, I think that is true. I think we could also consider enabling it for the write of the data that is in memory, but updating the write_dataset
function will probably be more involved, and I think SWMR probably is most relevant from iterative write anyways.
I'd be interested in seeing a speed comparison
What would you like to see added to HDMF?
The SWMR mode prevents/limits file corruption. What are the performance and feature costs of using SWMR? Should it be on by default? How do we test it?
https://docs.h5py.org/en/stable/swmr.html
One potential pain point is that "New groups and datasets cannot be created when in SWMR mode."
Is your feature request related to a problem?
No response
What solution would you like?
More research
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct