Closed astellhorn closed 1 month ago
Initial thoughts:
I will make some experiments to see if this works as I have in mind.
Current idea:
The idea is that, given a table of one or more files, we:
Below is a working example using esssans
.
Notes:
sciline.ParamTable
of Request
objects with sufficiently small time ranges. Or we could run the workflow in a loop.from dataclasses import dataclass
import numpy as np
from typing import NewType
import scipp as sc
import scippnexus as snx
import sciline
import esssans as sans
from esssans.loki import data
Filename = NewType('Filename', str)
files = [
data.get_path('60250-2022-02-28_2215.nxs'),
data.get_path('60339-2022-02-28_2215.nxs'),
]
@dataclass
class FileInfo:
filename: Filename
times: sc.Variable
@property
def start_time(self) -> sc.Variable:
return self.times.min()
@property
def end_time(self) -> sc.Variable:
return self.times.max()
def index(self, time: sc.Variable) -> int:
return np.argmin(np.abs((self.times - time).values))
def read_file_info(filename: Filename) -> FileInfo:
with snx.File(filename) as f:
times = f[
'entry/instrument/larmor_detector/larmor_detector_events/event_time_zero'
][()]
return FileInfo(filename, times)
def read_file(filename: Filename, start: int, stop: int) -> sc.DataGroup:
with snx.File(filename) as f:
dg = f['event_time_zero', start:stop]
return dg
@dataclass
class Request:
start_time: sc.Variable
end_time: sc.Variable
def read(
request: Request, file_infos: sciline.Series[Filename, FileInfo]
) -> sc.DataGroup:
result = sc.DataGroup()
for info in file_infos.values():
if request.start_time <= info.end_time and request.end_time >= info.start_time:
start = info.index(request.start_time)
end = info.index(request.end_time)
print(info.filename, start, end)
result[info.filename] = read_file(info.filename, start, end)
return result
providers = [read_file_info, read]
pipeline = sciline.Pipeline(providers)
pipeline.set_param_series(Filename, files)
start1 = sc.datetime('2022-03-01T17:41:58.744846154', unit='ns')
end1 = sc.datetime('2022-03-01T18:11:58.044793515', unit='ns')
start2 = sc.datetime('2022-03-03T12:46:44.707338042', unit='ns')
end2 = sc.datetime('2022-03-03T13:18:12.507090746', unit='ns')
pipeline[Request] = Request(start1, end2 - sc.scalar(1000, unit='s').to(unit='ns'))
result = pipeline.get(sc.DataGroup)
result.visualize()
dg = result.compute()
dg
I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"
(Also not after cloning the github repository esssans and going to that folder)
Questions:
I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file? I guess that is the goal? Because in the example polarized zoom data we have one filename with the information on 4 different spin states (i.e., all four spin states are in one file and need to be extracted to be read in our esspolarization workflow)
One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell (accordingly to ['selog']['Spin_flipper']['value_log']['value'] instead in ['selog']['He_state']['value_log']['value'] - though I am not sure of the difference between the value_log and the ['selog']['Spin_flipper']['value'] - do you know that?)
Then I would say we need something like the suggested workflow on the top, with Info on "Spin_Flipper"-state, "He_state", "time" (and optimally also "sample position" "He position", but they seem to not be logged in the .nxs. I just got a table from the zoom beamline scientist explaining which filenumbers were for which measurements)
Note for the example polarization data from ZOOM in the Long_3He_run folder (long 3He runs for data reduction with glassy carbon as non-magnetic "sample (GC)" for reference):
missing compared to our workflow:
I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"
(Also not after cloning the github repository esssans and going to that folder)
pip install esssans
(or use conda).
Questions:
1. I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file?
In this example it used time intervals. The example does not show how the correct time intervals are determined, it just show how we can hide the fact that it reads, e.g., the second half of a first file, the entire second file, and the first half of a third file.
2. One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell
Exactly, what you describe would give us the required time intervals. We would read meta data from all files, determine time intervals, and then run the above.
For the ZOOM files this is actually split into "periods", so one might use those instead, but the issue here is about the general approach.
Do you think for this it would make sense to go together through the zoom files and what needs to be read-in for which dataset? We will need to adapt the workflow in a way that says that for cell=Polarizer values are known, time-decay is infinite (T1 = constant), as only cell=analyzer was probed in the ZOOM examples, but it would test the workflow as whole still. For this either an online-meeting or again meeting at DMSC is possible, whatever would be the most efficient.
Either works for me!
Status update: We will not follow the approach I proposed above, but rely on a new mechanism that will be made available in Sciline soon.
https://github.com/scipp/esssans/pull/135 which should be in the next ESSsans release should address this.
How to read in the Data Arrays from multple NeXus files into our workflow?