hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
51 stars 28 forks source link

Error when saving markers with navigation dimension to hspy format #212

Closed magnunor closed 10 months ago

magnunor commented 10 months ago

When making a Texts marker with navigation dimensions, and adding to a signal with navigation dimensions. An error is raised when trying to save it (plotting it works fine):

Minimal working example

import numpy as np
import hyperspy.api as hs
from hyperspy.drawing._markers.texts import Texts

s = hs.signals.Signal2D(np.zeros((2, 100, 100)))
text_offset = np.empty(2, dtype=object)
text = np.empty(2, dtype=object)

for i in range(2):
    text_offset[i] = [[50, 40], ]
    text[i] = ['(30.0, -15.0)']

marker_texts = Texts(offsets=text_offset, texts=text, color="red")
s.add_marker(marker_texts, permanent=True)
s.save("testsignal.hspy")

Gives the error:

rsciio/hspy/_api.py:121, in HyperspyWriter._get_object_dset(group, data, key, chunks, **kwds)
    119     dtype = data[test_ind].compute().dtype
    120 else:
--> 121     dtype = data[test_ind].dtype
    122 dset = group.require_dataset(
    123     key, data.shape, dtype=h5py.special_dtype(vlen=dtype), chunks=chunks, **kwds
    124 )
    125 return dset

AttributeError: 'list' object has no attribute 'dtype'

Python environment:

Additional context

This does not occur if there are no navigation dimensions.

CSSFrancis commented 10 months ago

@magnunor I can maybe look into this.

Note that this fails as well:

import numpy as np
import hyperspy.api as hs
from hyperspy.drawing._markers.texts import Texts

s = hs.signals.Signal2D(np.zeros((2, 100, 100)))
text_offset = np.empty(2, dtype=object)
text = np.empty(2, dtype=object)

for i in range(2):
    text_offset[i] = np.array([[50, 40], ])
    text[i] = np.array(['(30.0, -15.0)'])

marker_texts = Texts(offsets=text_offset, texts=text, color="red")
s.add_marker(marker_texts, permanent=True)
s.save("testsignal.hspy")

TypeError: No conversion path for dtype: dtype('<U13')

Which maybe is the more proper way to do this.

With zspy it works :).

import numpy as np
import hyperspy.api as hs
from hyperspy.drawing._markers.texts import Texts

s = hs.signals.Signal2D(np.zeros((2, 100, 100)))
text_offset = np.empty(2, dtype=object)
text = np.empty(2, dtype=object)

for i in range(2):
    text_offset[i] = np.array([[50, 40], ])
    text[i] = np.array(['(30.0, -15.0)'])

marker_texts = Texts(offsets=text_offset, texts=text, color="red")
s.add_marker(marker_texts, permanent=True)
s.save("testsignal.zspy")
CSSFrancis commented 10 months ago

The reason that I didn't hard code it so that it makes all lists --> arrays is that some things like Polygons don't play very well with anything but a ragged array of ragged arrays.

I didn't realize that had some unexpected consequences. For saving string arrays in hdf5.... I'm not sure confident that is possible. We've already had a bit of a time saving ragged arrays and some of the functionality there still doesn't exist for hdf5 files. Unfortunately we are really hitting the limitations there....

CSSFrancis commented 10 months ago

If we really want to save this kind of information we could just save a 1 dimensional array of all of the relevant text and then load/ split it. I've actually considered having all of the vectors in a sorted list with extra columns for the navigation axes.

In some ways this would be ideal and quite a bit faster but it would come with its own set of problems.

ericpre commented 10 months ago

I had a quick look at this and the issue is that h5py doesn't support for these numpy dtype (for example: https://docs.h5py.org/en/stable/strings.html#what-about-numpy-s-u-type). The way around these is to convert the unsupported type to something supported by h5py.

For example, the following example works for saving without changing rosettasciio code:

import hyperspy.api as hs
import numpy as np

# Create a Signal2D with 1 navigation dimension
rng = np.random.default_rng(0)
data = np.ones((10, 100, 100))
s = hs.signals.Signal2D(data)

s2 = hs.signals.Signal2D(data)

import h5py
dt = h5py.special_dtype(vlen=str) 

offsets = np.empty(s.axes_manager.navigation_shape, dtype=object)
texts = np.empty(s.axes_manager.navigation_shape, dtype=object)

for index in np.ndindex(offsets.shape):
    i = index[0]
    offsets[index] = rng.random((10, 2))[:i+2] * 100
    texts[index] = np.array(['a' * (i+1), 'b', 'c', 'd', 'e', 'f', 'g', 'f', 'h', 'i'][:i+2], dtype=dt)

m2 = hs.plot.markers.Texts(
    offsets=offsets,
    texts=texts,
    sizes=3,
    facecolor="black",
    )

s2.add_marker(m2, permanent=True)
s2.plot()
s2.save("test.hspy", overwrite=True)

s3 = hs.load("test.hspy")

Plotting the marker is still not working because h5py read the array as bytestring instead of string and hopefully it would be a trivial workaround. Obviously, we can't used h5py dtype when creating the marker but it illustrates the idea and we should be able to convert to "suitable" type in the hspy plugin.

@CSSFrancis, does this make sense to you?

CSSFrancis commented 10 months ago

@ericpre that seems perfect! I didn't realize h5py had their own string dtype.

There is the warning about fixed length strings truncating, is that something we have to worry about when casting back and forth?

ericpre commented 10 months ago

What warning are you referring to? I don't have any.