hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
47 stars 28 forks source link

hs.load() producing incorrect results with sum_EDS_detectors=True #286

Closed Jordi-W closed 2 months ago

Jordi-W commented 3 months ago

Describe the bug

According to documentation, using hs.load() with argument sum_EDS_detectors=True on an .edm file with spectral images from multiple detectors combines the spectra from the different detectors. However, a correctly combined spectral image is not produced. Instead, if lazy=False, the resulant image is just the same as that from the first detector, whilst if lazy=True, instead the resultant image is the image from the first detector multiplied by the total number of detectors

To Reproduce

Have a valid .edm file called "Test.edm" with data from 4 EDS detectors

# imports the correct libriaries
%matplotlib qt
import hyperspy.api as hs
import os
import matplotlib.pyplot as plt

#File of example data, must have data from 4 different EDS detectors
filename=r'Test.emd'
be_lazy=True #If changed to False, produces a different, incorrect behaviour
#Load data with sum_EDS_detectors=False, this may take a while
s1 = hs.load(filename, load_SI_image_stack=True,sum_frames=False, sum_EDS_detectors=False, lazy=be_lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

#Manually summing frames
s5=s1[-1]+s1[-2]+s1[-3]+s1[-4]

#Also loading it with EDS frames sum_EDS_detectors=True
s2 = hs.load(filename, load_SI_image_stack=True,sum_frames=False, sum_EDS_detectors=True, lazy=be_lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

#Plotting
s2[-1].inav[:4,:4,0].isig[0:0.4].plot() #Plot of detector data summed with sum_EDS_detectors=True
s5.inav[:4,:4,0].isig[0:0.4].plot() #Plot of actual sum of data from detectors
s1[-4].inav[:4,:4,0].isig[0:0.4].plot() #Plot of data from the EDS first detector, to demonstrate that result from sum_EDS_detectors=True is 4 times this

Expected behavior

Instead, hs.load( sum_EDS_detectors=True) should sum data from all detectors.

Python environment:

Additional context

I believe that I have a fix for the problem. In /rsciio/emd_emd_velox.py, change

s0 = _read_stream(subgroup_keys[0])
            streams = [s0]
            # add other stream streams
            if len(subgroup_keys) > 1:
                for key in subgroup_keys[1:]:
                    stream_data = spectrum_stream_group[key]["Data"][:].T[0]
                    if self.lazy:
                        s0.spectrum_image = (
                            s0.spectrum_image
                            + s0.stream_to_sparse_array(stream_data=stream_data)
                        )
                    else:
                        s0.stream_to_array(
                            stream_data=stream_data, spectrum_image=s0.spectrum_image
                        )

to

            # add other stream streams
            if len(subgroup_keys) > 1:
                for key in subgroup_keys[1:]:
                    stream_data = spectrum_stream_group[key]["Data"][:].T[0]
                    sk=_read_stream(key)
                    if self.lazy:
                        s0.spectrum_image = (
                            s0.spectrum_image
                            + sk.stream_to_sparse_array(stream_data=stream_data)
                        )
                    else:
                        s0.spectrum_image = (
                            s0.spectrum_image
                            + sk.stream_to_array(stream_data=stream_data)
                        )

            streams = [s0]

With this alteration, the example code above produces correct results. As I am new to HyperSpy, it would be best if this could be reviewed by a more experienced pair of eyes.

ericpre commented 3 months ago

Thanks @Jordi-W for the bug report. From reading the code, there is indeed a bug but not the one that you describe, there is the lazy version that has the issue! 😉 See #287 for a fix with a test that checks that the sum of each stream are different.