hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
51 stars 28 forks source link

*.zspy was not readable from hs.load function, during EMC2024_hyperspy workshop. #303

Closed junbeom-park-FZJ closed 3 months ago

junbeom-park-FZJ commented 3 months ago

Describe the bug

During the lesson 7. at EMC2024 - hyperspy workshop, I tried to load the *.zspy data file and error popped up as below.

ERROR | Hyperspy | If this file format is supported, please report this error to the RosettaSciIO developers at https://github.com/hyperspy/rosettasciio/issues (hyperspy.io:599)

Detailed error message is at additional context.

To Reproduce

Setup the environment similar to organizer suggested way for advanced user in workshop repository

conda create -n hyperspy_EMC2024 python=3.11 hyperspy pyxem ipympl exspy lumispy ipykernel -c conda-forge
conda activate hyperspy_EMC2024
conda install nb_conda_kernels jupyterlab start_jupyter_cm ipympl -c conda-forge

at visual studio with hyperspy_EMC2024 env

import hyperspy.api as hs
hs.__version__ ##2.1.1
s = hs.load("stem_holz_data.zspy", lazy=True)

And the load function failed and error popped up. I did not checked the way of hyperspy-bundle (for basic user).

Expected behavior

As lecturer, the data is loaded.

Python environement:

Additional context

I checked at jupyter lab, but the result was same. The helper recommended to test it at hyperspy version 2.0, but the error was same.

Whole error message below the main error message.

{
KeyError                                  Traceback (most recent call last)
Cell In[2], line 3
      1 import hyperspy.api as hs
      2 get_ipython().run_line_magic('matplotlib', 'qt5')
----> 3 s = hs.load(\"stem_holz_data.zspy\", lazy=True)

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\hyperspy\\io.py:517, in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, escape_square_brackets, stack_metadata, load_original_metadata, show_progressbar, **kwds)
    514         objects.append(signal)
    515 else:
    516     # No stack, so simply we load all signals in all files separately
--> 517     objects = [load_single_file(filename, lazy=lazy, **kwds)
    518                for filename in filenames]
    520 if len(objects) == 1:
    521     objects = objects[0]

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\hyperspy\\io.py:517, in <listcomp>(.0)
    514         objects.append(signal)
    515 else:
    516     # No stack, so simply we load all signals in all files separately
--> 517     objects = [load_single_file(filename, lazy=lazy, **kwds)
    518                for filename in filenames]
    520 if len(objects) == 1:
    521     objects = objects[0]

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\hyperspy\\io.py:576, in load_single_file(filename, **kwds)
    569     raise ValueError(
    570         \"`reader` should be one of None, str, \"
    571         \"or a custom file reader object\"
    572     )
    574 try:
    575     # Try and load the file
--> 576     return load_with_reader(filename=filename, reader=reader, **kwds)
    578 except BaseException:
    579     _logger.error(
    580         \"If this file format is supported, please \"
    581         \"report this error to the HyperSpy developers.\"
    582     )

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\hyperspy\\io.py:597, in load_with_reader(filename, reader, signal_type, convert_units, load_original_metadata, **kwds)
    595 lazy = kwds.get('lazy', False)
    596 if isinstance(reader, dict):
--> 597     file_data_list = importlib.import_module(reader[\"api\"]).file_reader(filename,
    598                                                                     **kwds)
    599 else:
    600     # We assume it is a module
    601     file_data_list = reader.file_reader(filename, **kwds)

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\rsciio\\zspy\\_api.py:288, in file_reader(filename, lazy, **kwds)
    284     raise
    286 reader = ZspyReader(f)
--> 288 return reader.read(lazy=lazy)

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\rsciio\\_hierarchical.py:259, in HierarchicalReader.read(self, lazy)
    257 for experiment in experiments:
    258     exg = self.file[\"Experiments\"][experiment]
--> 259     exp = self.group2signaldict(exg, lazy)
    260     # assign correct models, if found:
    261     _tmp = {}

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\rsciio\\_hierarchical.py:336, in HierarchicalReader.group2signaldict(self, group, lazy)
    331     metadata = \"metadata\"
    332     original_metadata = \"original_metadata\"
    334 exp = {
    335     \"metadata\": self._group2dict(group[metadata], lazy=lazy),
--> 336     \"original_metadata\": self._group2dict(group[original_metadata], lazy=lazy),
    337 }
    338 if \"attributes\" in group:
    339     # RosettaSciIO version is > 0.1
    340     exp[\"attributes\"] = self._group2dict(group[\"attributes\"], lazy=lazy)

File c:\\Users\\j.park\\AppData\\Local\\miniconda3\\envs\\hyperspy_EMC2024\\Lib\\site-packages\\zarr\\hierarchy.py:511, in Group.__getitem__(self, item)
    509         raise KeyError(item)
    510 else:
--> 511     raise KeyError(item)

KeyError: 'original_metadata'"
}
ericpre commented 3 months ago

It could be that the data is corrupted.

Can you check that the folderstem_holz_data.zspy\Experiments\fpd_data\original_metadata exists for this dataset? Some MS windows version have a maximum path length and this could be that when you extracted the file, you hit this limit. If so, extract the file somewhere else with a shorter base path or enable long path.

junbeom-park-FZJ commented 3 months ago

Dear @ericpre , Thanks for the comment. Yes, you were right! Relative path was short enough, but absolute path for dataset was quite long. Once I moved the dataset to short absolute path, it worked.