Closed davidcurie closed 5 months ago
I can fix this error for myself by adjusting the offending line in sidpy.base.string_utils
to replace the S
type with U
, like so:
def clean_string_att(att_val):
...
335 return att_val
336 elif np.any([type(x) in [str, unicode, bytes, np.str_] for x in att_val]):
--> 337 return np.array(att_val, dtype='U')
338 elif isinstance(att_val, (list, tuple)):
Is this an easy port to implement upstream, or are there legacy concerns for why this needs to be S
?
I was wrong.
While my np.array(att_val, dtype="U")
adjustment fixes my unicode conversion on simple test strings and isolated test cases, it ultimately fails when I try to save it to USID-formatted .hdf5
.
The problem then becomes that the labels
attribute fails to update on either ancillary dataset.
File "/path/to/python3.12/site-packages/pyUSID/io/hdf_utils/model.py", line 936, in write_main_dataset
link_as_main(h5_main, h5_pos_inds, h5_pos_vals, h5_spec_inds, h5_spec_vals)
File "/path/to/python3.12/site-packages/pyUSID/io/hdf_utils/simple.py", line 557, in link_as_main
return USIDataset(h5_main)
^^^^^^^^^^^^^^^^^^^
File "/path/to/python3.12/site-packages/pyUSID/io/usi_data.py", line 91, in __init__
if not check_if_main(h5_ref):
^^^^^^^^^^^^^^^^^^^^^
File "/path/to/python3.12/site-packages/pyUSID/io/hdf_utils/simple.py", line 454, in check_if_main
validate_anc_dset_attrs(h5_pos_inds, h5_pos_vals, is_spec=False)
File "/path/to/python3.12/site-packages/pyUSID/io/hdf_utils/simple.py", line 488, in validate_anc_dset_attrs
v_names = get_attr(h5_vals, 'labels')
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/python3.12/site-packages/sidpy/hdf/hdf_utils.py", line 125, in get_attr
raise KeyError("'{}' is not an attribute in '{}'".format(attr_name, h5_object.name))
KeyError: "'labels' is not an attribute in '/Measurement_000/Channel_000/Position_Values'"
For clarity, my construction pipeline is based on the following example, which uses both sidpy and pyUSID. I am uncertain if the true scope of this issue belongs in sidpy or pyUSID.
import pyUSID as usid
import sidpy
import h5py
⋮
pos_dims = [
usid.Dimension("x", "µm", pos_x_vals),
usid.Dimension("y", "µm", pos_y_vals),
]
spec_dims = [usid.Dimension("Wavelength", "nm", spec_vals)]
with h5py.File(path, "w") as h5:
⋮
h5_meas_group = sidpy.prov_utils.create_indexed_group(h5, "Measurement")
h5_channel_group = sidpy.prov_utils.create_indexed_group(h5_meas_group, "Channel")
h5_raw = usid.hdf_utils.write_main_dataset(...
pos_dims=pos_dims, # Position dimensions
spec_dims=spec_dims, # Spectroscopic dimensions
)
pyUSID: 0.0.12 sidpy: 0.12.3 h5py: 3.11.0
I can confirm that sidpy datasets do support unicode attributes and can be saved to hdf5 and recovered without issue. However, you are correct that it seems this is not functioning correctly with USID. I will delve into this a bit further and let you know.
I instituted a fix that works on my end, in this commit
Can you please let me know if it resolves your issue? I tested it with a USID dataset and it seemed to function:
num_rows = 32
num_cols = 32
rows_vals = np.linspace(-0.5, 0.5, 32)
cols_vals = np.linspace(2.5, 3.5, 32)
pos_dims = [usid.Dimension('Cols', 'µm', cols_vals),
usid.Dimension('Rows', 'µm', rows_vals)]
spec_dims = [usid.Dimension('Grays', 'a.u.', np.array([0]))]
new_h5_filename = r'new_exp_file5.hf5'
hf = h5py.File(new_h5_filename, 'a') #Create a new file
hf_meas_grp = hf.create_group('Measurement_000/Channel_000') #Create a group
h5_raw = usid.hdf_utils.write_main_dataset(hf_meas_grp, # parent HDF5 group
(num_rows * num_cols, 1), # shape of Main dataset
'Raw_Data', # Name of main dataset
'Image', # Physical quantity contained in Main dataset
'nA', # Units for the physical quantity
pos_dims, # Position dimensions
spec_dims, # Spectroscopic dimensions
dtype=np.float32, # data type / precision
compression='gzip',
main_dset_attrs={'IO_rate': 4E+6})
print(h5_raw.pos_dim_descriptors)
which yielded
['Rows (µm)', 'Cols (µm)']
This fix is now merged into main, so hopefully if you use sidpy from the main branch you will no longer run into this problem.
Yes, the above fix solves my issue with no further adjustments to my pipeline. Thanks!
I am trying to store a
usid.Dimension("x", "µm", pos_x_vals)
with a unicode label because I want my axes labels on generated graphs to use the proper symbol for microns.I am greeted with the following error:
My pipeline otherwise works if I use
usid.Dimension("x", "um", pos_x_vals)
as demonstrated in some of the example notebooks.