Closed axelboc closed 3 months ago
As far as I can tell, a VDS can be built from source datasets with heterogeneous metadata (different or no chunking, different compression, dtype, etc.) - so I'm not sure that a one-to-one mirroring is possible. We could probably add a way to get a list of all the source datasets, and then you could use that to decide on loading plugins?
That seems reasonable. Do you think this could be directly included in the object returned by get_dataset_metadata
?
interface Metadata {
...
sourceDatasets?: { fileId: string; path: string }[]
}
Do you think there are ever virtual datasets with enough source datasets that this would become a serious performance bottleneck for reading metadata? In h5py for instance, they have a method for retrieving source dataset metadata that is separate from reading the metadata of the virtual dataset Dataset.virtual_sources()
Maybe at least a count to hint at whether the dataset has virtual sources? Not sure of the performance implications...
After making a preliminary implementation, it looks like it's only reading info from the dcpl of the virtual dataset, so that should be really fast (it doesn't have to resolve the source datasets), so I'm going to go with your first suggestion (but use file_name
and dset_name
to be more similar to what h5py puts out).
Note that sources within the same file seem to have file_name: "."
> f.get("data_compressed_via_vds").metadata
{
signed: false,
type: 1,
vlen: false,
littleEndian: true,
size: 8,
total_size: 50,
shape: [ 50 ],
maxshape: [ 50 ],
chunks: null,
virtual_sources: [ { file_name: '.', dset_name: '/data_compressed' } ]
}
I think this is closed by eb296cb0db9e8cf066f2b2e6ff4429f286da112b (published just now as v0.7.5) Let me know if there are any issues!
Looks good to me! https://github.com/silx-kit/h5web/pull/1662
Thanks for the quick turnaround, as always 😁
When a virtual dataset points to a compressed dataset, the
filters
information is missing from the virtual dataset's metadata. This prevents H5Web/h5wasm from knowing to load the compression plugins before attempting to read the data. See https://github.com/silx-kit/vscode-h5web/issues/43