astellhorn commented 6 months ago

How to read in the Data Arrays from multple NeXus files into our workflow?

after each change of cell position / cell spin status / sample change / parameter change (e.g. temperature,...), etc., a new NeXus file will be recorded
for the lifetime of each cell, the data needs to be merged for (i) computation of cell-polarization and -transmission, and (ii) for the respective data correction where these cells have been used.
At the moment: assume that test-data is in one single file, from which we can readout the metadata information such as spin/position/...

SimonHeybrock commented 6 months ago

Initial thoughts:

We should probably process each file "independently" by applying the reduction workflow to each file on its own to compute $I(Q)$. One reason for is that this should allow for computing intermediate result such as only the transmission, without having to load and process all files.
Subsequent parts of the workflow can gather (and merge) the relevant files.
It might be possible and advantageous if we could transparently select part of files, corresponding to a time range. For example, if a direct-beam run is not in its own file, having a workflow that can identify a time range and then load part of a file "as if" it was a stand-alone file would hide the difference from the user.

I will make some experiments to see if this works as I have in mind.

SimonHeybrock commented 6 months ago

Current idea:

The idea is that, given a table of one or more files, we:

Read some relevant info (from the table, or the files, or both).
Figure out which file(s) or file section(s) correspond to something we care about (direct beam, sample run, ...).
Apply the read and merge (similar to what was adde added in esssans), potentially reading only sections of files (usually slicing based on wall-clock time).

SimonHeybrock commented 5 months ago

Below is a working example using esssans.

Notes:

I did not implement file merging, but this is implemented elsewhere already (kind of).
Here I "request" data based on the wall clock times. Obvious this part would likely be modified in practice.
I need to think more about how to handle large amount of data, where we need to process files or sections of files independently. One option would be we define a sciline.ParamTable of Request objects with sufficiently small time ranges. Or we could run the workflow in a loop.

from dataclasses import dataclass
import numpy as np
from typing import NewType
import scipp as sc
import scippnexus as snx
import sciline
import esssans as sans
from esssans.loki import data

Filename = NewType('Filename', str)

files = [
    data.get_path('60250-2022-02-28_2215.nxs'),
    data.get_path('60339-2022-02-28_2215.nxs'),
]

@dataclass
class FileInfo:
    filename: Filename
    times: sc.Variable

    @property
    def start_time(self) -> sc.Variable:
        return self.times.min()

    @property
    def end_time(self) -> sc.Variable:
        return self.times.max()

    def index(self, time: sc.Variable) -> int:
        return np.argmin(np.abs((self.times - time).values))

def read_file_info(filename: Filename) -> FileInfo:
    with snx.File(filename) as f:
        times = f[
            'entry/instrument/larmor_detector/larmor_detector_events/event_time_zero'
        ][()]
    return FileInfo(filename, times)

def read_file(filename: Filename, start: int, stop: int) -> sc.DataGroup:
    with snx.File(filename) as f:
        dg = f['event_time_zero', start:stop]
    return dg

@dataclass
class Request:
    start_time: sc.Variable
    end_time: sc.Variable

def read(
    request: Request, file_infos: sciline.Series[Filename, FileInfo]
) -> sc.DataGroup:
    result = sc.DataGroup()
    for info in file_infos.values():
        if request.start_time <= info.end_time and request.end_time >= info.start_time:
            start = info.index(request.start_time)
            end = info.index(request.end_time)
            print(info.filename, start, end)
            result[info.filename] = read_file(info.filename, start, end)

    return result

providers = [read_file_info, read]
pipeline = sciline.Pipeline(providers)
pipeline.set_param_series(Filename, files)
start1 = sc.datetime('2022-03-01T17:41:58.744846154', unit='ns')
end1 = sc.datetime('2022-03-01T18:11:58.044793515', unit='ns')
start2 = sc.datetime('2022-03-03T12:46:44.707338042', unit='ns')
end2 = sc.datetime('2022-03-03T13:18:12.507090746', unit='ns')
pipeline[Request] = Request(start1, end2 - sc.scalar(1000, unit='s').to(unit='ns'))
result = pipeline.get(sc.DataGroup)
result.visualize()
dg = result.compute()
dg

astellhorn commented 5 months ago

I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"

(Also not after cloning the github repository esssans and going to that folder)

astellhorn commented 5 months ago

Questions:

I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file? I guess that is the goal? Because in the example polarized zoom data we have one filename with the information on 4 different spin states (i.e., all four spin states are in one file and need to be extracted to be read in our esspolarization workflow)
One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell (accordingly to ['selog']['Spin_flipper']['value_log']['value'] instead in ['selog']['He_state']['value_log']['value'] - though I am not sure of the difference between the value_log and the ['selog']['Spin_flipper']['value'] - do you know that?)
Then I would say we need something like the suggested workflow on the top, with Info on "Spin_Flipper"-state, "He_state", "time" (and optimally also "sample position" "He position", but they seem to not be logged in the .nxs. I just got a table from the zoom beamline scientist explaining which filenumbers were for which measurements)

astellhorn commented 5 months ago

Note for the example polarization data from ZOOM in the Long_3He_run folder (long 3He runs for data reduction with glassy carbon as non-magnetic "sample (GC)" for reference):

691 693 695 697 --> TRANS measurements without GC ("DB-run")

according to our I^(DB, unpol-in, cell)
690 692 694 696 --> SANS with GC ("sample run")
according to our I^(++, +-, -+, --)
698 --> SANS but without GC, for background information (
according to our I^(DB, unpol-in, cell)_background
would be our data for background subtraction. But they seem to have made an extra measurement, whereas we chose to just take the bg and some edge of the detector at certain Q
706 --> TRANS Reference of DB with GC
(maybe) related to sample transmission correction, what is part of the "unpolarized SANS" workflow for us
699 --> TRANS on depolarized 3He cell, without GC
according to our I^(DB, depol, cell)

missing compared to our workflow:

I^(DB, no-cell)
but that is just some data for normalization, maybe one could take it out of the workflow for testing the rest
note: each ZOOM-file contains all 4 spin-values that we need to extract out to match to our workflow, see note above

SimonHeybrock commented 5 months ago

I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"

(Also not after cloning the github repository esssans and going to that folder)

pip install esssans (or use conda).

SimonHeybrock commented 5 months ago

Questions:

1. I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file?

In this example it used time intervals. The example does not show how the correct time intervals are determined, it just show how we can hide the fact that it reads, e.g., the second half of a first file, the entire second file, and the first half of a third file.

2. One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell

Exactly, what you describe would give us the required time intervals. We would read meta data from all files, determine time intervals, and then run the above.

For the ZOOM files this is actually split into "periods", so one might use those instead, but the issue here is about the general approach.

astellhorn commented 5 months ago

Do you think for this it would make sense to go together through the zoom files and what needs to be read-in for which dataset? We will need to adapt the workflow in a way that says that for cell=Polarizer values are known, time-decay is infinite (T1 = constant), as only cell=analyzer was probed in the ZOOM examples, but it would test the workflow as whole still. For this either an online-meeting or again meeting at DMSC is possible, whatever would be the most efficient.

SimonHeybrock commented 5 months ago

Either works for me!

SimonHeybrock commented 2 months ago

Status update: We will not follow the approach I proposed above, but rely on a new mechanism that will be made available in Sciline soon.

SimonHeybrock commented 1 month ago

https://github.com/scipp/esssans/pull/135 which should be in the next ESSsans release should address this.

scipp / esspolarization

Multiple NeXus files readout #19

691 693 695 697 --> TRANS measurements without GC ("DB-run")

690 692 694 696 --> SANS with GC ("sample run")

698 --> SANS but without GC, for background information (

706 --> TRANS Reference of DB with GC

699 --> TRANS on depolarized 3He cell, without GC