hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
44 stars 26 forks source link

EMD Velox: fix reading EDS stream detector lazily with `sum_EDS_detectors=True` #287

Closed ericpre closed 1 week ago

ericpre commented 3 weeks ago

When summing the stream, the same stream was used several times

Fix #286.

Progress of the PR

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.67%. Comparing base (2c33d09) to head (634db2f).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #287 +/- ## ======================================= Coverage 87.66% 87.67% ======================================= Files 83 83 Lines 11150 11149 -1 Branches 2414 2414 ======================================= Hits 9775 9775 Misses 860 860 + Partials 515 514 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ericpre commented 3 weeks ago

@Jordi-W, can you try this change?

Jordi-W commented 3 weeks ago

@Jordi-W, can you try this change?

Will do so Monday morning.

Jordi-W commented 3 weeks ago

I have tested this change for both lazy and not lazy loading, it only fixes the issue for lazy loading.

ericpre commented 3 weeks ago

Can you make a reproducible example with a file?

The test added in this PR check that the results are the same for the four combinations: lazy, True / False and sum_EDS_detectors, True / False. There must be something else going on.

Jordi-W commented 2 weeks ago

Here is a short example. Adjust filename to refer to location of the file attached (which you must unzip first).

# imports the correct libriaries
%matplotlib qt
import hyperspy.api as hs
import os
import matplotlib.pyplot as plt

#File of example data, must have data from 4 different EDS detectors
filename=r'Test.emd'
be_lazy=False
#Loads data with sum_EDS_detectors=False
s1 = hs.load(filename, load_SI_image_stack=True,sum_frames=False, sum_EDS_detectors=False, lazy=be_lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

#Manually summing frames
s5=s1[-1]+s1[-2]+s1[-3]+s1[-4]

#Also loading it with EDS frames sum_EDS_detectors=True
s2 = hs.load(filename, load_SI_image_stack=True,sum_frames=False, sum_EDS_detectors=True, lazy=be_lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

#Plotting
s2[-1].inav[:4,:4,0].isig[0:0.4].plot() #Plot of detector data summed with sum_EDS_detectors=True
s5.inav[:4,:4,0].isig[0:0.4].plot() #Plot of actual sum of data from detectors
s1[-4].inav[:4,:4,0].isig[0:0.4].plot() #Plot of data from the EDS first detector, to demonstrate that result from sum_EDS_detectors=True is equal to this

Test.zip

ericpre commented 2 weeks ago

It seems to be related to the use of sum_frames. The example below shows it works as expected for sum_frames=True (default) but not sum_frames=False (the figure shows the expected behaviour):

import hyperspy.api as hs

s1 = hs.load(filename, sum_EDS_detectors=False, sum_frames=True)
s2 = hs.load(filename, sum_EDS_detectors=True, sum_frames=True)

s1_sum = hs.stack(s1[-4:]).sum("stack_element")

# Take the difference and sum in the navigation space
diff_sum = (s1_sum - s2[-1]).sum()
diff_sum.plot() # values are 0

# take the difference between single detector and all four detectors summed
# and sum over the navigation space to compare spectra
# All spectra are different
diff_list = [(s2[-1] - s1[-i]).sum() for i in range(1, 5)]
hs.plot.plot_spectra(diff_list, style="mosaic", legend=[f"index: -{i}" for i in range(1, 5)])

image

image

ericpre commented 2 weeks ago

@Jordi-W, can you try again? There was another bug when using sum_frames=False.

Jordi-W commented 2 weeks ago

I have checked and I can only reproduce the bug when sum_frames=False, confirming what you said. Good catch, that.

ericpre commented 2 weeks ago

The last few commits should fix the bug that you originally reported, can you confirm that this is correct?

Jordi-W commented 2 weeks ago

Even with all your commits, I still get the error when sum_frames=False. I get the error both with the code that I submitted earlier, and your example if I change it so that sum_frames=False.

ericpre commented 2 weeks ago

I updated your example to use array comparison instead of visually checking on a plot:

import hyperspy.api as hs
import numpy as np

#File of example data, must have data from 4 different EDS detectors
filename='Test.emd'
lazy = False
sum_frames = False
#Loads data with sum_EDS_detectors=False
s1 = hs.load(filename, load_SI_image_stack=True, sum_frames=sum_frames, sum_EDS_detectors=False, lazy=lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

#Manually summing frames
s_sum = s1[-1] + s1[-2] + s1[-3] + s1[-4]

#Also loading it with EDS frames sum_EDS_detectors=True
s2 = hs.load(filename, load_SI_image_stack=True, sum_frames=sum_frames, sum_EDS_detectors=True, lazy=lazy, rebin_energy=2, show_progressbar=True, SI_dtype='uint16')

# Check if there are equal: if so, no AssertionError is raised
np.testing.assert_allclose(s_sum.data, s2[-1].data)

# AssertionError is raised because these are not equal
np.testing.assert_allclose(s1[-4].data, s2[-1].data)

It raises an AssertionError in the last line because the two arrays are not equal, which is expected behaviour.

Are you sure that the branch your are using is up to date?

Jordi-W commented 1 week ago

Sorry for the late reply. I have checked with the latest version of HyperSpy using your commits and I replicated your findings. The bug is now gone, regardless of whether I load lazily or not.

ericpre commented 1 week ago

Thank you for the confirmation!