scipp / esspolarization

Polarization data reduction for the European Spallation Source
https://scipp.github.io/esspolarization/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Multiple NeXus files readout #19

Closed astellhorn closed 1 month ago

astellhorn commented 6 months ago

How to read in the Data Arrays from multple NeXus files into our workflow?

SimonHeybrock commented 6 months ago

Initial thoughts:

I will make some experiments to see if this works as I have in mind.

SimonHeybrock commented 6 months ago

Current idea:

Image

The idea is that, given a table of one or more files, we:

SimonHeybrock commented 5 months ago

Below is a working example using esssans.

Notes:

from dataclasses import dataclass
import numpy as np
from typing import NewType
import scipp as sc
import scippnexus as snx
import sciline
import esssans as sans
from esssans.loki import data

Filename = NewType('Filename', str)

files = [
    data.get_path('60250-2022-02-28_2215.nxs'),
    data.get_path('60339-2022-02-28_2215.nxs'),
]

@dataclass
class FileInfo:
    filename: Filename
    times: sc.Variable

    @property
    def start_time(self) -> sc.Variable:
        return self.times.min()

    @property
    def end_time(self) -> sc.Variable:
        return self.times.max()

    def index(self, time: sc.Variable) -> int:
        return np.argmin(np.abs((self.times - time).values))

def read_file_info(filename: Filename) -> FileInfo:
    with snx.File(filename) as f:
        times = f[
            'entry/instrument/larmor_detector/larmor_detector_events/event_time_zero'
        ][()]
    return FileInfo(filename, times)

def read_file(filename: Filename, start: int, stop: int) -> sc.DataGroup:
    with snx.File(filename) as f:
        dg = f['event_time_zero', start:stop]
    return dg

@dataclass
class Request:
    start_time: sc.Variable
    end_time: sc.Variable

def read(
    request: Request, file_infos: sciline.Series[Filename, FileInfo]
) -> sc.DataGroup:
    result = sc.DataGroup()
    for info in file_infos.values():
        if request.start_time <= info.end_time and request.end_time >= info.start_time:
            start = info.index(request.start_time)
            end = info.index(request.end_time)
            print(info.filename, start, end)
            result[info.filename] = read_file(info.filename, start, end)

    return result

providers = [read_file_info, read]
pipeline = sciline.Pipeline(providers)
pipeline.set_param_series(Filename, files)
start1 = sc.datetime('2022-03-01T17:41:58.744846154', unit='ns')
end1 = sc.datetime('2022-03-01T18:11:58.044793515', unit='ns')
start2 = sc.datetime('2022-03-03T12:46:44.707338042', unit='ns')
end2 = sc.datetime('2022-03-03T13:18:12.507090746', unit='ns')
pipeline[Request] = Request(start1, end2 - sc.scalar(1000, unit='s').to(unit='ns'))
result = pipeline.get(sc.DataGroup)
result.visualize()
dg = result.compute()
dg
astellhorn commented 5 months ago

I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"

(Also not after cloning the github repository esssans and going to that folder)

astellhorn commented 5 months ago

Questions:

  1. I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file? I guess that is the goal? Because in the example polarized zoom data we have one filename with the information on 4 different spin states (i.e., all four spin states are in one file and need to be extracted to be read in our esspolarization workflow)

  2. One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell (accordingly to ['selog']['Spin_flipper']['value_log']['value'] instead in ['selog']['He_state']['value_log']['value'] - though I am not sure of the difference between the value_log and the ['selog']['Spin_flipper']['value'] - do you know that?)

  3. Then I would say we need something like the suggested workflow on the top, with Info on "Spin_Flipper"-state, "He_state", "time" (and optimally also "sample position" "He position", but they seem to not be logged in the .nxs. I just got a table from the zoom beamline scientist explaining which filenumbers were for which measurements)

astellhorn commented 5 months ago

Note for the example polarization data from ZOOM in the Long_3He_run folder (long 3He runs for data reduction with glassy carbon as non-magnetic "sample (GC)" for reference):

691 693 695 697 --> TRANS measurements without GC ("DB-run")

missing compared to our workflow:

SimonHeybrock commented 5 months ago

I can unfortunately not test the above script, as I get the error message "No module named 'esssans'". Also, trying to add "from ess import sans" does give me the error " cannot import name 'sans' from 'ess' (unknown location)"

(Also not after cloning the github repository esssans and going to that folder)

pip install esssans (or use conda).

SimonHeybrock commented 5 months ago

Questions:

1. I get that this script reads files and their information, but how does it differ between the different "parts of one file", i.e., to read out the different information changing in time within one file?

In this example it used time intervals. The example does not show how the correct time intervals are determined, it just show how we can hide the fact that it reads, e.g., the second half of a first file, the entire second file, and the first half of a third file.

2. One can see that nicely in the example file you are loading in the example in esssans PR 50 where you plot the Spin_flipper value. So we would need to read out the value of this spin flipper and also of the 3He cell

Exactly, what you describe would give us the required time intervals. We would read meta data from all files, determine time intervals, and then run the above.

For the ZOOM files this is actually split into "periods", so one might use those instead, but the issue here is about the general approach.

astellhorn commented 5 months ago

Do you think for this it would make sense to go together through the zoom files and what needs to be read-in for which dataset? We will need to adapt the workflow in a way that says that for cell=Polarizer values are known, time-decay is infinite (T1 = constant), as only cell=analyzer was probed in the ZOOM examples, but it would test the workflow as whole still. For this either an online-meeting or again meeting at DMSC is possible, whatever would be the most efficient.

SimonHeybrock commented 5 months ago

Either works for me!

SimonHeybrock commented 2 months ago

Status update: We will not follow the approach I proposed above, but rely on a new mechanism that will be made available in Sciline soon.

SimonHeybrock commented 1 month ago

https://github.com/scipp/esssans/pull/135 which should be in the next ESSsans release should address this.