Closed mavaylon1 closed 2 months ago
Notes and Questions:
- Does scalar_fill imply that the dataset has only 1 value and should only have 1?
Are you referring to
If so, this function is used to write scalar datasets, i.e., dataset with a single value.
2. There are three cases with references where the shape is defined within required_dataset.
Could you point to the case you are referring to? require_dataset
is usually used to create a dataset it if it doesn’t exist and open the dataset if it does.
2. The quickest solution is to set maxshape at each of the three locations.
This would mean to make all datasets expandable by enabling chunking for all datasets. That is a bit broader approach then to make this the default just for VectorData, but it would make it the default behavior for all (non-scalar) datasets. If that is the approach we'd want to take, then I would suggest adding a parameter enable_chunking=True
on HDFIO.write
and HDFIO.export
so that we can configure the default behavior for write. @rly thoughts?
- Does scalar_fill imply that the dataset has only 1 value and should only have 1?
Are you referring to
If so, this function is used to write scalar datasets, i.e., dataset with a single value.
2. There are three cases with references where the shape is defined within required_dataset.
Could you point to the case you are referring to?
require_dataset
is usually used to create a dataset it if it doesn’t exist and open the dataset if it does.2. The quickest solution is to set maxshape at each of the three locations.
This would mean to make all datasets expandable by enabling chunking for all datasets. That is a bit broader approach then to make this the default just for VectorData, but it would make it the default behavior for all (non-scalar) datasets. If that is the approach we'd want to take, then I would suggest adding a parameter
enable_chunking=True
onHDFIO.write
andHDFIO.export
so that we can configure the default behavior for write. @rly thoughts?
The enable chunking parameter to give the user the option to turn off the expandable default? If so, would there be a reason they would want to?
The enable chunking parameter to give the user the option to turn off the expandable default? If so, would there be a reason they would want to?
In my experience it is best to make choices explicit and provide useful defaults rather than hiding configurations. A user may not want to use chunking if they want to use numpy memory mapping to read contiguous datasets.
@rly I will shoot to have this done by next week (formerly Friday May 3).
Dev Notes: When writing datasets, we have a few options:
From my understanding we only need modify the input parameter options for only list_fill.
Now Oliver mentioned being more explicit with a switch enable_chunking=True
(default will be true) on HDFIO.write and HDFIO.export so that we can configure the default behavior for write. this will need to be passed through the chain of methods from write
and export
to write_dataset
.
From my understanding we only need modify the input parameter options for only list_fill.
Now Oliver mentioned being more explicit with a switch
enable_chunking=True
(default will be true) on HDFIO.write and HDFIO.export so that we can configure the default behavior for write. this will need to be passed through the chain of methods fromwrite
andexport
towrite_dataset
.
Yes, I believe that is correct. I think only logic in list_fill
should need to be modified and then the enable_chunking
setting will need to be passed through. Note, list_fill
already is being passed the argument options
which contains io_settings
so I think you may just need to set chunks=True
in the io_settings
(if chunks
is set to None) to enable chunking. I'm not sure if it will be easiest to do this change if io_settings
in list_fill
or to update the io_settings
outside of list_fill
so that list_fill
would not need to change at all.
Tests:
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 88.70%. Comparing base (
126bdb1
) to head (58f3bf0
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@oruebel This is mostly done. I need to check/update or write a test that my changes does not do any interference of existing maxshape settings. (And do another pass through to make sure the logic is efficient) However, the main point I want to bring up is your idea of having a parameter for turning on and off the expandability. This would mean HDMFIO
has a parameter that is not used in ZarrIO. In fact there are tests failing due to the parameter not being recognized. I see we have two options:
This would mean HDMFIO has a parameter that is not used in ZarrIO.
I don't think the parameter needs to be in HDMFIO. I think it's ok to just add as a parameter in HDF5IO
This would mean HDMFIO has a parameter that is not used in ZarrIO.
I don't think the parameter needs to be in HDMFIO. I think it's ok to just add as a parameter in HDF5IO
HDF5IO write needs to call write_builder. It does that by calling super().write(**kwargs). This then gets us to HDMFIO write, which calls write_builder.
HDF5IO write needs to call write_builder. It does that by calling super().write(**kwargs). This then gets us to HDMFIO write, which calls write_builder.
Yes, but HDMFIO.write
allows extra keyword arguments:
and those are being passed through to write_builder
So you can add custom keyword arguments without having to add them in HDMFIO
. HDF5IO
already has several additional arguments on write
and write_builder
that are not in HDMFIO
, such as the exhaust_dci
parameter.
HDF5IO write needs to call write_builder. It does that by calling super().write(**kwargs). This then gets us to HDMFIO write, which calls write_builder.
Yes, but
HDMFIO.write
allows extra keyword arguments:and those are being passed through to
write_builder
So you can add custom keyword arguments without having to add them in
HDMFIO
.HDF5IO
already has several additional arguments onwrite
andwrite_builder
that are not inHDMFIO
, such as theexhaust_dci
parameter.
Well isn't that just right in front of my face.
Notes: My approach to the tests:
I added some minor suggestions, but otherwise this looks good to me.
Thanks for the quick review. I will make the doc string more detailed, but take a look at my comments for the other changes. The pass was a deliberate thing (vs a left over from a draft) and I like the warning.
Could you add documentation on how to expand a VectorData?
It looks like creation of a dataset of references is not modified here. Some tables in NWB contain columns that are all references, e.g., the electrode table has a column with references to the ElectrodeGroup. I think such datasets should be expandable as well.
Could you add documentation on how to expand a VectorData?
It looks like creation of a dataset of references is not modified here. Some tables in NWB contain columns that are all references, e.g., the electrode table has a column with references to the ElectrodeGroup. I think such datasets should be expandable as well.
Yeah the lack of dataset of references support was just a smaller scope for this idea. I agree this makes a lot of sense to have. I will make this an issue ticket.
As for the expansion of VectorData documentation, I thought we had that. Maybe I am thinking of the HDF5 documentation, but I will look. If it does not exist, I will loop that into the ticket for dataset of references.
Motivation
What was the reasoning behind this change? Please explain the changes briefly.
This change allows the new default behavior for writing VectorData data as expandandable datasets. We do this by providing maxshape to dataset settings that do not already have a defined maxshape set by the user.
How to test the behavior?
Testts
Checklist
CHANGELOG.md
with your changes?