usnistgov / PyHyperScattering

Tools for hyperspectral x-ray and neutron scattering data loading, reduction, slicing, and visualization.
Other
8 stars 10 forks source link

Primary csv metadata change #40

Closed pdudenas closed 2 years ago

pdudenas commented 2 years ago

Data from 2022-1 no longer has 'RSoXS Shutter Opening Time (ms)' in the primary csv, resulting in a KeyError being raised in SST1RSoXSLoader.read_primary. @EliotGann Is this a permanent change, and if so, where else is the exposure time recorded?

EliotGann commented 2 years ago

It should still be there…. Possibly it was a wrong kind of scan? If it was an rsoxs scan it should be there

EliotGann commented 2 years ago

Spirals and NEXAFs will not have it

pdudenas commented 2 years ago

I just checked and there are RSoXS scans from 2022-1 and 2022-2 that don't have it, but others that do. It may be missing for scans with the SAXS camera? unclear

EliotGann commented 2 years ago

Ok, I believe the issue is if it is not changed, it will not produce a primary column, in which case it should be read from the baseline. This is pretty general.

EliotGann commented 2 years ago

There will be a few places in the baseline which should have it. I can search through and find them sometime. Some are in ms and others in seconds.

pdudenas commented 2 years ago

Exposure time is not present in baseline.csv either. A long term fix will be to derive exposure time from "Shutter Toggle_monitor.csv" file. The temporary solution is to assign a dummy exposure of 1s.

pdudenas commented 2 years ago

I have a fix that calculates the shutter exposure from the "Shutter Toggle_monitor.csv" file.

try:
    primary_dict['exposure'] = df_primary['RSoXS Shutter Opening Time (ms)'][seq_num]
except KeyError:
    shutter_fname = list(cwd.glob('*Shutter Toggle*'))
    primary_dict['exposure'] = self.read_shutter_toggle(shutter_fname[0])*1000 # keep in ms
    warnings.warn('No exposure time found in primary csv. Calculating from Shutter Toggle csv', stacklevel=2)

As it turns out we call self.loadMd in loadSingleImage, which is then called for each image in loadFileSeries, so we end up re-reading the metadata for as many images as we have (and as I found out, if there is no exposure time in the primary csv, this warning gets printed out many many times). @pbeaucage, is the re-reading of metadata a known thing that isn't of concern because it's pretty quick, or should I open up a separate issue to fix this and only read it in once for a file series?

pbeaucage commented 2 years ago

Yes, known feature/bug that clashes with how suitcased data is stored...

Basically many instruments write metadata once per image file, rather than once per scan. If that's the case, you sort of have to hit loadMd for every image in loadSeries (to decide whether to load the image) and in loadSingleImage (to actually get the metadata)

I think it is pretty quick, conceptually at least, but may be worth a separate issue. In principle changing the Loader API to have optional metadata caching would not be hard. You could even patch it in just in the SST loader by adding class variables and some logic in loadMd The question is how much benefit it really offers.