tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
706 stars 286 forks source link

Access to attributes of an h5 file using tfio #1118

Open zaccharieramzi opened 4 years ago

zaccharieramzi commented 4 years ago

Hi,

I am interested in using from_hdf5 features of the IODataset and IOTensor classes. I am in addition looking for a way to access the attributes of the h5 file.

With h5py, you would typically have:

import h5py

with h5py.File(filename) as h5_obj:
    my_attr_value = h5_obj.attrs['my_attr_key']
kvignesh1420 commented 4 years ago

@zaccharieramzi can this be closed based on https://github.com/tensorflow/io/issues/1144 ?

zaccharieramzi commented 4 years ago

Well no it's not the same thing. The attributes of an h5 files are not related to it having boolean data or not.

kvignesh1420 commented 4 years ago

my bad, I thought the colab notebook that you referenced there might have helped you address this issue. So isn't defining the TypeSpec and capturing the relevant data helpful here?

zaccharieramzi commented 4 years ago

No worries. Well the attributes of an h5 file are metadata concerning that file. The current API will only allow you to access data from that file.

The difference between data and meta-data (attributes) will translate to the following in h5py:

import h5py

with h5py.File(filename) as h5_obj:
    my_attr_value = h5_obj.attrs['my_attr_key']  # this is meta-data
    my_data_value = h5_obj['my_data_key']

tfio.IOTensor.from_hdf5 only allows you to access the data of that file. My current workaround is to simply process the meta-data (usually very light) in pure python.