[xas_viewer] loading/handling multi-element fluorescence signals

maurov commented 3 years ago

@newville I got a feature request for loading multi-element fluorescence signals in xas_viewer as a single action, that is, the user opens a Spec/BLISS file, selects a bunch of channels and then each is imported as a single group.

I currently do this via a script to convert the input file to an Athena project. Do you think it is worth to implement in the GUI? If yes we should then handle also the possibility for the specfile_importer to remember settings if the user selects many files (but for this I will open a separate issue).

Test data file: test_canb_single.zip

#script to convert ESRF-BM30 multi-element fluorescence channels Spec file (aka 'canb') to an Athena project
import tempfile
import matplotlib.pyplot as plt
from larch import Group
from larch.io import AthenaProject
from larch.io.specfile_reader import DataSourceSpecH5

tf_prj = tempfile.mktemp(prefix='canb2athena_', suffix=".prj")
print(tf_prj)
apj = AthenaProject(tf_prj)

d = DataSourceSpecH5('test_canb_single.dat', verbose=True)

scans = d.get_scans()
fig, axs = plt.subplots(ncols=1, nrows=len(scans))

for iscan, scan in enumerate(scans):
    scan_no = scan[0]
    scan_label = scan_no.replace(".", "_") #Larch group names cannot contain .
    d.set_scan(scan_no)

    ene = d.get_array("Energy") * 1000
    norm = d.get_array("I0")
    bad_channels = []

    try:
        ax = axs[iscan]
    except:
        ax = axs
    ax.set_title(f"scan {scan_no}")

    for cnt in d.get_counters():
        if 't' in cnt:
            continue
        if cnt in ['I0', 'T', 'Energy']:
            continue
        mu = d.get_array(cnt)
        g = Group(athena_id=f"{scan_label}_{cnt}", datatype='xas', energy=ene, mu=mu, i0=norm)
        try:
            apj.add_group(g)
        except:
            print(f"skipped {cnt} (-> bad_channels)")
            bad_channels.append(cnt)
            pass
        if cnt in bad_channels:
            continue
        else:
            ax.plot(g.energy, g.mu, label=cnt)
    _ = ax.legend()
plt.tight_layout()
apj.save()

newville commented 3 years ago

@maurov I don't see how the multi-element data is encoded in the data file for your example.

We don't attempt "automatically add channels" for plain ASCII files either, but there is a section in the data importer to help the user add columns. I might suggest that we add that same "add columns" button to the Spec file reader. Would that be sufficient?

We also don't "remember" that selection, partly because it is not clear whether that should be by column number or label.

I also think that we should continue to strongly encourage beamlines to actually produce more manageable files for XAFS data and not expect downstream codes to do beamline-specific data processing like adding together multi-element data, and (especially) applying deadtime corrections. There are just too many ways to describe such data and what will seem obvious to one person will not be obvious to everyone else.

maurov commented 3 years ago

@maurov I don't see how the multi-element data is encoded in the data file for your example.

Sorry, I forgot to mention to unzip the file first. It is a single scan Spec file.

We don't attempt "automatically add channels" for plain ASCII files either, but there is a section in the data importer to help the user add columns. I might suggest that we add that same "add columns" button to the Spec file reader. Would that be sufficient?

Yes, I was thinking to the "add columns" panel, with the difference that instead of adding columns, the selected columns could be imported separately and let the user merge the data after.

We also don't "remember" that selection, partly because it is not clear whether that should be by column number or label.

For the Spec/BLISS files, if one selects more than on file, the "importing behaviour" to remember should be:

select the same scan index in the list.
select X/Y with the same labels.

I also think that we should continue to strongly encourage beamlines to actually produce more manageable files for XAFS data and not expect downstream codes to do beamline-specific data processing like adding together multi-element data, and (especially) applying deadtime corrections. There are just too many ways to describe such data and what will seem obvious to one person will not be obvious to everyone else.

Yes, I fully agree with this. It would be a never-ending story implementing data reduction actions. I opened this issue mainly to store an example of simple script to convert a specific type of data file into an Athena project file that can then be easily opened in the GUI.

I may put it in the examples or in the documentation, if you think it could result useful.

maurov commented 1 year ago

@newville I realize that the way I wrote initially this issue is misleading. Actually, independently of a multi-element fluorescence signals, the request I get from the users is to load multi-columns files in a single action in xas_viewer. Let's say, imagine one has a file with a common energy array (col1) and multiple spectras for the others columns (col2, col3, ...), ideally one would read each (col1, col2) (col1, col3) (col1, ...) in separate groups with a single action in xas_viewer. I think this feature would be really useful. The groups will then be imported as group_colN. What do you think? Would this require a lot of work?

newville commented 1 year ago

@maurov Yeah, I think we should try to do something about this.

Having the option to read individual columns as separate groups could be OK.

It might also be OK to read in as currently but have an optional form in XAS Viewer to re-build the "mu" data from the raw data (which we generally keep -- I don't recall if all the arrays are kept when reading Spec).

I think the real challenge is being able to apply deadtime corrections. A recent question on the ifeffit mailing list needed to

Floud = Sum(i=0, 12) [ ROI[i] ICR[i] / (OCR[i] FastTrig[i]) ]

Fluor = Sum(i=0, 12) ( datal[5+i,:] data[18+i,:] / (data[31+i,:] data[44+i,:]) )

Like, we would have to help the user identify where "ROI" columns were, which ICR, OCR (and sometimes FastTrigger) columns - there just is not a convention.

And, the reason for doing any of this might be that "Channel 4 and 8 are bad for this scan".... so that sort of has to be allowed too. And then of course the user is going to want to read in 50 scans like that.

I think it would be possible, but I also sort of think it might need to generate some sort of dictionary that assigns meaning to columns... it just gets sort of messy.

I'm not opposed, I think it is just "hard".

maurov commented 1 year ago

@newville I think there are two separate tasks here:

1) read multiple Y array columns in a single action; 2) perform dead-time correction or any other operation to build the mu array with some formula

Implementing 1 should be rather straightforward (we could put a check box in the panel to load all column other than energy in one action) and simple. What do you think?

For 2 is more complex and my point is that such kind of operation may be done directly by the user in the Larch shell. It would be the safest option. We could make a command that from the Larch shell creates another group in the list of groups.

newville commented 1 year ago

@maurov Yeah, I think this is doable. I might say it should an optional window that pops up from the "Column File Browser" or "Spec File Browser" to select multi-element channels.

Do you have example Spec files from ESRF with multi-channel fluorescence data? I think I do not.

newville commented 1 year ago

@maurov So, I think I have a working version of this for the "basic" column-file browser. You can now choose building Y / mu data from single data columns or from multiple columns, which will pop up a new window to guide that process, including selecting deadtime corrections terms and bad channels. I'm sure it is not 100% complete, but it it probably enough to try out.

This does work with the "test_canb_single" data set and with data from my beamline (where I know how the deadtime terms should be applied). I'll look into doing this for the Spec browser too.

maurov commented 1 year ago

@newville I had a look to what you did for reading multi-channel fluorescence detector and the idea is nice for applying dead-time correction, even though it will be difficult to adapt to any format how the statistics metadata of the detector are stored. For example, in our case we measure a linearity curve with another detector (e.g. diode or ion chamber) and determine the time constant that is stored. Anyway, I think those corrections should be performed directly at the beamline.

On the other hand, the feature I was initially asking on behalf of the users is much simpler: simply have the possibility to load multiple Y arrays in a single action.

Let me try explaining it with a simple example of multi-columns data independent of a fluorescence detector, here from a FDMNES simulation (please, unzip it first):

FDMNES_2023_CuO6_conv.zip

I would add to the RadioBox list an option load all Y arrays that when clicking on the OK button will create all groups for each Y array label not corresponding to the X array one. In the example given, three groups will be created: $groupname_100, $groupname_001 and $groupname_xanes.

This feature is then generically useful for any multi-column file, either if it is from a fluorescence detector with multiple channels (dead-time corrected or not) or simply a dataset with many data sharing the same energy column. Those use cases are very common among users.

If that clearer now? What do you think?

newville commented 1 year ago

@newville I had a look to what you did for reading multi-channel fluorescence detector and the idea is nice for applying dead-time correction, even though it will be difficult to adapt to any format how the statistics metadata of the detector are stored. For example, in our case we measure a linearity curve with another detector (e.g. diode or ion chamber) and determine the time constant that is stored. Anyway, I think those corrections should be performed directly at the beamline.

Yes, there are several ways to encode deadtime information. From what I've seen, saving ICR and OCR (or LiveTime for RealTime) for each energy point is very common, as many detector readout systems just provide this info. I've advocated (and implemented for Epics + Xspress3) saving a single multiplicative factor. Sometimes (notably the very common XIA xMap system and at least its Epics interface), there is a third term "FastLiveTime". I think that the current form will handle all of those cases.

Yes, if you store tau (per detector element, maybe), then you have to have that and output counts, and can then work out the correction (for example, with https://github.com/xraypy/xraylarch/blob/master/larch/xrf/deadtime.py#L239).

Anyway, I think that beamlines should help out with supplying a single "Deadtime corrected sum", but not all do, and sometimes you do want to check for bad channels -- which might be a permanent condition or a per-scan problem. So allowing the user to check and rebuild the deadtime correction and the sum of channels seems fine to me.

On the other hand, the feature I was initially asking on behalf of the users is much simpler: simply have the possibility to load multiple Y arrays in a single action.

I don't disagree that this would be useful, and an earlier working version of the "multicolumn fluorescence" window had something like that. I'm not certain it is actually simpler ;).

But I think that I might move this all around a bit, keeping the current as "multi-channel fluorescence and dead-time correction" window and also adding a simpler window to select multiple (non-transmission!) columns to read in as separate groups (column / i0).

maurov commented 1 year ago

@newville thanks for having implemented the "select multiple columns" for the columnfile reader. Unfortunately, it does not work for me on the FDMNES data example provided: I get only one group created, while I was expecting three. Here a screenshot:

Furthermore, I0 does not get 1.0 by default, but an empty value.

newville commented 1 year ago

@maurov Oh yes, this is not yet actually working ;) I'm working on that today....

maurov commented 1 year ago

@newville the "select multiple columns" in the columnfile reader works nicely now, thanks! Do you think it is feasible to have such feature also for the Specfile reader panel before releasing 0.9.70?

newville commented 1 year ago

@maurov I think so. This was actually a fair amount of a rewrite, but I think it should be mostly transferrable.

newville commented 1 year ago

@maurov Do you have any spec files with multiple actual spectra with multiple useful columns that you can share? All of the spec files I have contain lots of completely useless scans and lots of unused columns -- mostly junk. Please send a real file with actual usable data in multiple columns.

maurov commented 1 year ago

@maurov Do you have any spec files with multiple actual spectra with multiple useful columns that you can share? All of the spec files I have contain lots of completely useless scans and lots of unused columns -- mostly junk. Please send a real file with actual usable data in multiple columns.

@newville sorry for that. I will provide an example of data file soon. I am ending my working day for now.

newville commented 1 year ago

@maurov I think that multi-columns from Spec files is working, at least in favorable cases ;). More testing would be good.

I should say that for multiple columns from multiple scans, the code does assume that the columns (by index) are consistent. I think there is basically no hope for handling the case of a Spec file in which the meaning or number of columns changes.

maurov commented 1 year ago

@maurov I think that multi-columns from Spec files is working, at least in favorable cases ;). More testing would be good.

@newville thanks for implementing the multi-columns reading in Spec file reading panel too. Here an example of data (please, unzip first)

testlarch_bm30_xmap_20230607_cut.zip

There are two scans with the fluorescence channels called xmapNN.

Unfortunately, it does not work for me. The trace below:

  File "/home/mauro/devel/xraylarch/larch/wxlib/specfile_importer.py", line 416, in onMultiColumn
    self.show_subframe('multicol', MultiColumnFrame,
  File "/home/mauro/devel/xraylarch/larch/wxlib/specfile_importer.py", line 450, in show_subframe
    self.subframes[name] = frameclass(self, **opts)
  File "/home/mauro/devel/xraylarch/larch/wxlib/columnframe.py", line 370, in __init__
    nlabels = len(array_labels)
TypeError: object of type 'NoneType' has no len()

I should say that for multiple columns from multiple scans, the code does assume that the columns (by index) are consistent. I think there is basically no hope for handling the case of a Spec file in which the meaning or number of columns changes.

The order (=index) of columns will not change in the Spec file within multiple scans, nor I think with the mapping done by Silx to have an hdf5-like API. The best is to use the counter label to access the data (see get_counters method) instead of the index.

newville commented 1 year ago

@maurov I think this is now working better.

maurov commented 1 year ago

@newville it works now, great!

xraypy / xraylarch

[xas_viewer] loading/handling multi-element fluorescence signals #311