Open rubenvereecken opened 7 years ago
This is a useful feature. Do you plan to get one batch from the dataset to compute its shape?
No, as far as I know h5py datasets have the shape
attribute which should be just fine. I've never used dimension scales though, nor do I know about variable-length datasets. Either way, I think variable length only goes in the first dimension? Whereas if you'd get a batch from the dataset you don't know anything about the first dimension anyway.
This pull request is meant to initiate discussion and is by no means finished.
I needed to get the dimensions of my data before reading any data from my HDF5 files. There is the
num_examples
attribute but of course that's only limited to one dimension. I could not find any straightforward way to get all dimensions.. exceptH5PYDataset.source_shapes
seemed to represent what I wanted. But it wasn't really implemented. So it might very well be that I missed a way of getting my sources' dimensions but in the meantime I've implemented thesource_shapes
attribute to accomplish what I need, albeit in not all possible scenarios.If you agree that this
source_shapes
attribute is useful, I could look into how to complete the feature because currently it only works if the user has provided no custom slices (which suits my use case just fine for now).Little edit to explain why I want these dimensions. I want to specify an input layer's shape in a neural network by looking at the spec that is already present in the HDF5 datasets.