Open EliotGann opened 2 years ago
def stream_summary(run):
for stream_name in run.keys():
print(f'{stream_name} - {len(run[stream_name].data.read().time)}')
print(f'total time = {run.stop["time"]-run.start["time"]}')
def get_monitors(run):
monitors= None
i=0
for stream_name in run.keys():
if 'monitor' in stream_name:
if i==0:
monitors = run[stream_name].data.read()
else:
monitors = xr.merge((monitors,run[stream_name].data.read()))
i+= 1
return monitors
and the fill nan is done by monitors_filled = monitors.ffill('time')
An example of the coordinate munging needed:
data = c[-1]['primary']['data'].read() data = data.assign_coords(coords={'m3pitch':data['SST 1 Mirror 3 fmb_pitch_setpoint'],'energy':data['en_energy_setpoint']}).swap_dims({'time':'system'}).set_index({'system':['m3pitch','energy']}).unstack('system')#.reset_index('time',drop=True).unstack('system')
I think this is a really great idea, probably the right answer is to load the data as a DataSet with variables for the monitor streams. Perhaps hiding this behind a flag like return_monitors=True
would be a way to make this non-api-breaking (but maybe this warrants breaking the API?)
This is certainly a prerequisite for #18, fwiw
So the implementation of this in 45cdfe9 does a bunch of things worth noting:
loadRun
, return_dataset=True
. This was needed to not break the API (return_dataset
defaults to False
so existing code is not affected.ffill
(fill in NaN's by using the next-oldest valid value) followed by a bfill
(fill any remaining NaN's by using the next-newest valid value), then finally interpolating onto the time points of the RSoXS images. An appropriate approach would actually be integrating the sparse monitor signal during the exposure time.load.loadMonitors(run)
and you'll get all 13000 or whatever time points.@EliotGann, is the time interpolation onto the RSoXS resolution reasonable, or should we add another method that deals with the sparse/async array?
yeah, it's tricky. this is the code I use...
def get_raw_binned_NEXAFS(scan_id):
run = c[scan_id]
namelookup = {'en_monoen_readback':'energy'}
shutter_good = True
try:
shutter_xr = run['RSoXS Shutter Toggle_monitor'].data.read()
except KeyError:
shutter_good = False
pass
# get the monochromator readback
df = run['en_monoen_readback_monitor'].data.read()
# throw out values where the shutter was shut or close to the edge
if shutter_good:
df = xr.merge((df,shutter_xr))
df = df.interpolate_na('time').ffill('time').bfill('time')
df['RSoXS Shutter Toggle'].values = binary_erosion(df['RSoXS Shutter Toggle'].values,iterations=2,border_value=0)
df = df.where(df['RSoXS Shutter Toggle'] > 0).dropna('time').drop_vars('RSoXS Shutter Toggle')
# try to group in time bins, if that fails, then group directly in energy bins
try:
df,timeleftedges = bin_time(df,run,None,namelookup)
except ValueError:
df,enleftedges = bin_energy(df,run,None,namelookup)
pass
# throw out the values of each data channel when the shutter was closed
for channel in ['RSoXS Sample Current','WAXS Beamstop','SAXS Beamstop','RSoXS Au Mesh Current']:
data_xr = run[channel+'_monitor'].data.read()
if shutter_good:
data_xr = xr.merge((data_xr,shutter_xr))
data_xr = data_xr.interpolate_na('time').ffill('time').bfill('time')
data_xr['RSoXS Shutter Toggle'].values = binary_erosion(data_xr['RSoXS Shutter Toggle'].values,iterations=2,border_value=0)
data_xr = data_xr.where(data_xr['RSoXS Shutter Toggle'] > 0).dropna('time').drop_vars('RSoXS Shutter Toggle')
try:
binned_df,timeleftedges = bin_time(data_xr,run,timeleftedges,namelookup)
except ValueError:
binned_df,enleftedges = bin_energy(data_xr,run,enleftedges,namelookup)
pass
df = xr.merge((df,binned_df))
df = df.assign_coords({'en':df.energy}).set_index({'en':'energy'})
df = df.reset_index('time').reset_index('en')
df = df.swap_dims({'time':'en'}).rename_vars({'en_':'en'}).rename_vars({'time_':'time'}).reset_coords('time')
df.attrs.update(run.start)
return df```
Here is the code for normalization, this just normalized the data channels, not the RSoXS data, but you would so a similar thing, I think
`
def get_processed_NEXAFS(scanid,double_norm=True,norm_sample='diode',norm_uuid=None,norm_plan='auto_fly'):
retrieve data from scan (as function of time),
find i0_scan with similar parameters,
change both to functions of energy
sort the points vs energy and interpolate from i0 into the data scan
normalize the default NEXAFS data channels via double normalization using SAXS or WAXS diode in the i0_scan
parameters:
double_norm - whether to try to find normalization at all, if not, just return cleaned up singly normalized data
norm_sample - name of sample to find for normalization (usually Blank or diode)
norm_uuid - override the search and just use the specified i0 scan for normalization
norm_plan - the plan name to search for for normalization
"same" will use the same name as the input
"auto_fly" (default) uses the fly NEXAFS version of the same edge, if it is in the lookup table
"other" any other text will search for a plan with that specific name
'''
auto_LUT = {'full_ca_scan_nd':'fly_Calcium_NEXAFS',
'full_carbon_calcium_scan_nd':'fly_Carbon_NEXAFS',
'full_carbon_scan_nd':'fly_Carbon_NEXAFS',
'full_carbon_scan_nonaromatic':'fly_Carbon_NEXAFS',
'full_fluorine_scan_nd':'fly_Fluorine_NEXAFS',
'full_iron_scan_nd':'fly_Iron_NEXAFS',
'full_nitrogen_scan_nd':'fly_Nitrogen_NEXAFS',
'full_oxygen_scan_nd':'fly_Oxygen_NEXAFS',
'short_calcium_scan_nd':'fly_Calcium_NEXAFS',
'short_nitrogen_scan_nd':'fly_Nitrogen_NEXAFS',
'short_oxygen_scan_nd':'fly_Oxygen_NEXAFS',
'short_sulfurl_scan_nd':'fly_SulfurL_NEXAFS',
'short_zincl_scan_nd':'fly_Fluorine_NEXAFS',
'short_carbon_scan_nd':'fly_Carbon_NEXAFS',
'short_carbon_scan_nonaromatic':'fly_Carbon_NEXAFS',
'collins_carbon_survey_fixedpol':'fly_Carbon_NEXAFS',
'short_fluorine_scan_nd':'fly_Fluorine_NEXAFS',
'very_short_carbon_scan_nd':'fly_Carbon_NEXAFS',
'very_short_oxygen_scan_nd':'fly_Oxygen_NEXAFS',
'veryshort_fluorine_scan_nd':'fly_Fluorine_NEXAFS',
'survey_scan_lowenergy':'fly_Carbon_NEXAFS',
'sufficient_carbon_scan_nd':'fly_Carbon_NEXAFS',
'picky_carbon_scan_nd':'fly_Carbon_NEXAFS',
't_carbon_scan_nd':'fly_Carbon_NEXAFS',
'cdsaxs_scan':'fly_Carbon_NEXAFS',
'custom_rsoxs_scan':'fly_Carbon_NEXAFS',
'custom_rotate_rsoxs_scan':'fly_Carbon_NEXAFS',
'focused_carbon_scan_nd':'fly_Carbon_NEXAFS',
'g_carbon_scan_nd':'fly_Carbon_NEXAFS',
'fly_Oxygen_NEXAFS':'fly_Oxygen_NEXAFS',
'fly_Iron_NEXAFS':'fly_Iron_NEXAFS',
'fly_Nitrogen_NEXAFS':'fly_Nitrogen_NEXAFS',
'fly_Fluorine_NEXAFS':'fly_Fluorine_NEXAFS',
'fly_Boron_NEXAFS':'fly_Boron_NEXAFS',
'fly_Calcium_NEXAFS':'fly_Calcium_NEXAFS',
'fly_Carbon_NEXAFS':'fly_Carbon_NEXAFS',
'fly_SiliconK_NEXAFS':'fly_SiliconK_NEXAFS',
'fly_SiliconL_NEXAFS':'fly_SiliconL_NEXAFS',
'fly_SulfurL_NEXAFS':'fly_SulfurL_NEXAFS',
'full_Carbon_NEXAFS':'fly_Carbon_NEXAFS',}
df = get_raw_binned_NEXAFS(scanid)
run = c[scanid]
badi0 = True
trynum = 5
if double_norm:
if norm_uuid is not None:
i0df = get_raw_binned_NEXAFS(norm_uuid)
else:
if norm_plan == 'auto_fly':
plan_search = auto_LUT[df.attrs['plan_name']]
elif norm_plan == 'same':
plan_search = df.attrs['plan_name']
else:
plan_search = norm_plan
while badi0 and (trynum < 20):
try:
i0run = c.search(TimeRange(until=df.attrs['time'])).search(
RawMongo(start={'plan_name': plan_search,
'sample_name':'diode'}))[-trynum]
except ValueError:
trynum +=1
pass
continue
try:
i0df = get_raw_binned_NEXAFS(i0run.start['scan_id'])
except KeyError:
trynum +=1
pass
continue
if min(i0df['en'])-min(df['en']) < 1 and max(i0df['en'])-max(df['en']) < 1 and len(i0df['en']) / len(df['en']) > 0.5:
badi0 = False
print(f"scan number {i0run.start['scan_id']} was found after trying the previous {trynum} diode scans")
trynum += 1
if badi0:
print('Failed to find a good i0 for this scan')
return df,None
if 'SAXS' in i0df.attrs['RSoXS_Main_DET']:
i0df['i0correction'] = i0df['RSoXS Au Mesh Current'] / i0df['SAXS Beamstop']
else:
i0df['i0correction'] = i0df['RSoXS Au Mesh Current'] / i0df['WAXS Beamstop']
i0df = i0df.reset_coords(['i0correction']).drop_vars(['RSoXS Au Mesh Current','WAXS Beamstop','SAXS Beamstop','RSoXS Sample Current','RSoXS Au Mesh Current_std','WAXS Beamstop_std','SAXS Beamstop_std','RSoXS Sample Current_std'])
i0df = i0df.dropna('en').sortby('en')
df = df.dropna('en').sortby('en')
df['RSoXS Au Mesh Current'] /= i0df['i0correction'].interp(en=df['en'])
df['RSoXS Au Mesh Current_std'] /= i0df['i0correction'].interp(en=df['en'])
df['i0correction'] = i0df['i0correction'].interp(en=df['en'])
else:
df = df.dropna('en').sortby('en')
df['WAXS Beamstop'] /= df['RSoXS Au Mesh Current']
df['WAXS Beamstop_std'] /= df['RSoXS Au Mesh Current']
df['SAXS Beamstop'] /= df['RSoXS Au Mesh Current']
df['SAXS Beamstop_std'] /= df['RSoXS Au Mesh Current']
df['RSoXS Sample Current'] /= df['RSoXS Au Mesh Current']
df['RSoXS Sample Current_std'] /= df['RSoXS Au Mesh Current']
return df,i0df`
the problem is the monitors come way faster than the data, and they are collected while the shutter is opening and closing, so picking the one point, or averaging are both really messy. that code throws out the data when the shutter is closed
Minor issue that I came across while looking at I0 correction:
An issue with load monitors, when loading any scans using the SST1RSoXSDB loader, you get warned of errors assigning monitor readings:
E.g., running:
phsDataLoader = phs.load.SST1RSoXSDB(corr_mode='none')
# Load a single run
scan01_2022p2 = phsDataLoader.loadRun(phsDataLoader.c[42373], dims=['sam_x', 'sam_y', ]).unstack('system')
Results in:
/nsls2/users/bpatel/BP_LocalCodeBase/PyHyperScattering/src/PyHyperScattering/SST1RSoXSDB.py:433: UserWarning: Error while time-integrating onto images. Check data. monitors = self.loadMonitors(run) /tmp/ipykernel_192451/3722127918.py:4: UserWarning: Error assigning monitor readings to system. Problem with monitors. Please check. scan01_2022p2 = phsDataLoader.loadRun(phsDataLoader.c[42373], dims=['sam_x', 'sam_y', ]).unstack('system')
The underlying error is:
File ~.../SST1RSoXSDB.py:
in SST1RSoXSDB.loadMonitors(self, entry, integrate_onto_images, n_thinning_iters)
monitors = monitors.assign_coords({"time": primary_time}).reset_coords( "time_bins", drop=True)
ValueError: cannot remove index coordinates with reset_coords: {'time_bins'}
Problem Code:
monitors = monitors.assign_coords({"time": primary_time}).reset_coords("time_bins", drop=True)
Replaced with:
monitors = (
monitors.assign_coords({"time": primary_time})
.drop_indexes("time_bins")
.reset_coords("time_bins", drop=True)
)
Solved in Commits: eff1ea94e234accb36fa193d20807b7b8c8b86c0 00f5d8b4c4bb7c7da25bccc9a72baa241a7b3bf3 b0d2725381e14cae8dbdef165558aad358586582
Making that change^ allowed a different warning to surface:
1 phsDataLoader = phs.load.SST1RSoXSDB(corr_mode='none') 3 # Load a single run 2022-2 ----> 4 scan01_2022p2 = phsDataLoader.loadRun(phsDataLoader.c[42373]).unstack('system')
File ~/BP_LocalCodeBase/PyHyperScattering/src/PyHyperScattering/SST1RSoXSDB.py:633, in SST1RSoXSDB.loadRun(self, run, dims, coords, return_dataset) 626 monitors = ( 627 monitors.rename({"time": "system"}) 628 .reset_index("system") 629 .assigncoords(system=index) 630 .drop("system") 631 ) 632 except Exception as e: --> 633 raise (e) 634 warnings.warn( 635 "Error assigning monitor readings to system. Problem with monitors. Please check.", 636 stacklevel=2, 637 ) 638 retxr.attrs.update(md)
File ~/BP_LocalCodeBase/PyHyperScattering/src/PyHyperScattering/SST1RSoXSDB.py:627, in SST1RSoXSDB.loadRun(self, run, dims, coords, return_dataset) 624 try: 625 print(monitors.dims) 626 monitors = ( --> 627 monitors.rename({"time": "system"}) 628 .reset_index("system") 629 .assigncoords(system=index) 630 .drop("system") 631 ) 632 except Exception as e: 633 raise (e)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:5171, in Dataset.drop(self, labels, dim, errors, **labels_kwargs) 5165 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)): 5166 warnings.warn( 5167 "dropping variables using
drop
will be deprecated; using drop_vars is encouraged.", 5168 PendingDeprecationWarning, 5169 stacklevel=2, 5170 ) -> 5171 return self.drop_vars(labels, errors=errors) 5172 if dim is not None: 5173 warnings.warn( 5174 "dropping labels using list-like labels is deprecated; using " 5175 "dict-like arguments withdrop_sel
, e.g. `ds.drop_sel(dim=[labels]).", 5176 DeprecationWarning, 5177 stacklevel=2, 5178 )File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:5051, in Dataset.drop_vars(self, names, errors) 5049 names = set(names) 5050 if errors == "raise": -> 5051 self._assert_all_in_dataset(names) 5053 # GH6505 5054 other_names = set()
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:5018, in Dataset._assert_all_in_dataset(self, names, virtual_okay) 5016 bad_names -= self.virtual_variables 5017 if bad_names: -> 5018 raise ValueError( 5019 "One or more of the specified variables " 5020 "cannot be found in this dataset" 5021 )
ValueError: One or more of the specified variables cannot be found in this dataset
Which seems to be because monitors doesn't have "system_" after .assign_coords.
So for 16c0b50175b925243361fed9e6aeb4e5b0f7c07a I just commented out the .drop step and it loads md without warnings.
This is a subtle one and requires some explanation of how the monitors actually work.
Usually the problem with assigning monitors to the data is that Eliot's approach using binary thinning, for short exposures, thins the monitor data into non-existence for some time points. Try reducing the number of binary thinning iterations (it's a kwarg to the loadMonitors function) with the original production code and see if that fixes it.
Long term a better, more data-aware algorithm than binary thinning would be smarter.
Can provide more detailed explanation later, sorry for brevity.
To debug this, try just running loadMonitors, which has several debugging flags that turn off e.g. the thinning and time-integrating onto images, and see if you actually get useful monitor data.
The complexity of this data is a really unfortunate consequence of using a fast ammeter instead of a more traditional gated scaler to readout/integrate point detectors. It could be that there is simply not an elegant solution to dealing with the data.
So this is a long one,
Because i never wrote it here, the main issue I'm currently seeing is that on the 2022-3 data, the mesh current data loaded by loadMonitors is sparse and full of nans. Meanwhile, things seem fine with the 2022-2 data (so far).
To Summarize: I think the thinning/binning algorithms are not the source of the issue. There may be something wrong with shutter toggle data that we are reading in for 2022-3 data Scroll to the bottom to see the raw shutter toggle data plotted along primary timestamps.
For debugging, I tested against two sets of 6 energy/ polarization scans (C,F,O edge and 0/90 deg)
Set 1: Raw Mesh Current seems to be read in fine
integrate_onto_images=False
Set 2: Changing Thinning doesn't seem to make much difference
integrate_onto_images=True, n_thinning_iters=5
integrate_onto_images=True, n_thinning_iters=1
Set 3: Thinning and Filtering, but NOT binning
Commented out the binning step and you can see we lost most data points already
I decided to plot the primary timepoints against the raw shutter and found something interesting:
Full time-series:
Just the first 25% of timepoints:
Just the 2022-3 data:
So it looks like I am missing a bunch of expected shutter pulses in the 2022-3 data but not the 2022-2 data. I want to stress that I plotted the raw shutter toggle stream in this last set of plots, before any manipulation.
Is it possible the shutter data didn't fully get written out in 2022-3? I do have all of the expected images (so the shutter definitely was opening as its supposed to).
Issue 18 I0 correction_files.zip
*attachment is just a html with the figures embedded 230209_Issue 18 I0 correction.zip
@EliotGann does this make any sense instrumentation-wise?
Yeah, this is a known issue with exposures <~0.5 seconds, the shutter PV does not register many times. Often you can still look at the raw data and find a square wave (it's only a couple points so it's really not obvious a these lower exposure times). My conclusion has been to say that transmission NEXAFS is not possible during RSoXS scans with exposure times less than a second or so. If Transmission NEXAFS is needed, a separate NEXAFS scan is needed WITH THE DETECTORS OUT OF THE WAY (I.e. in one of the NEXAFS configurations). It is dangerous to take transmission NEXAFS if you are leaving the detector there the whole time, in particular if you are taking short exposures because there is a lot of beam hitting them.
basically you should just either have a separate algorithm for short exposures (what I do in igor) which helps a bit, but still will become impossible <0.2 seconds, or just don't analyze the NEXAFS for these shorter exposure scans.
Now, you CAN measure the Izero mesh which is in front of the shutter, and you can still plot that vs energy and have a good Izero measurement. it's just the transmission that is impossible. For that, I would just integrate between the readout times. It won't be perfect, but it should be alright.
This is rapidly veering into a beamline discussion, but the FMB I400 supports a gated readout mode: 0 1 2. Given that we're apparently losing most of the diode data during short exposures, wouldn't gating the diodes with the area detector readout provide more reliable data?
Not having a usable/reliable transmission measurement during the run is going to make a lot of quantitative studies difficult. And with how good the flux is, some samples will saturate the CCD's before we get a useful diode point with the latency/dropping we have.
As long as I'm commenting on years old issues... There just isn't a way to get reliable measurements of signals that last for less than ~.2 seconds, certainly not using asynchronous signals using epics. The solution is to move to synchronous measurement (getting something that can be gated reliably, and then get a reliable gate signal) but we just don't have that capability at the beamline. If the scattering signal is saturating the detector in <0.5 seconds, there might just not be a good way to get a decent signal to noise transmission signal during an RSoXS scan. In these cases (for the time being) I would suggest removing the detector (waxsnexafs configuration) and doing a fast transmission NEXAFS scan of the sample in fly scanning mode. This is almost always a cleaner and higher resolution option for measuring transmission anyways.
It would be good for normalization and lots of other reasons to load the monitor streams into separate dataarrays. they all have a timestamp/value setup basically, but all of the timestamps are asynchronous.
I have these functions which summarize the streams (including the monitors), and then dumps the monitors into a common xarray (with nans all over the place). It might be a starting point? I think filling in the nans would make a good next step