Closed codedump closed 2 years ago
The first problem is that you have data in related files (and related directories) that are completely undocumented in SPEC. And this situation compels a lot of additional work that is out-of-scope for spec2nexus, by its very nature of having no evidence in the SPEC data file.
The APS USAXS has a couple Pilatus detectors (for SAXS & WAXS). It uses EPICS area detector HDF5 plugin to write image files as directed from SPEC. Here's a scan from a typical SAXS collection:
#S 36 SAXS ./04_22_DACSolvent_saxs/PEI_M41_11_30C_2min_0010.hdf 0 0 1.3 20 1
#D Fri Apr 22 15:07:23 2022
#T 0.1 (seconds)
#G0 0
#G1 0
#G3 0
#G4 0
#Q
#P0 -22.29064 20 0 -0.00125 141.01 14.812881 220 3.2
#P1 5.894899e-07 5.00006 -50 11.00057 2.935 -4 0.3005 29.428909
#P2 0 0 70.11 26.355 0.200046 -8e-05 0.199844 0.8000816
#P3 0 55.004166 0.5 -0.5 0.5 -0.5 20.000182 24.4
#P4 31.7608 0 0 0 0 0 0 0
#P5 0 0 0 0 8.7491083 -159.79288 88.550355 0
#P6 0 0 0 0 0 0 13.201141 21
#P7 3.289935 3.86 3.8000125 3.1999625 8.893573 145.19988 -1.8549902 19.406
#P8 -441.63215
#C PEI_M41_11_30C_2min
#V0 102.309 0 1 0 1 1 1 1 993.932
#V1 -0.218301 -0.0235596 55.6508 8.12453 0 0 0 0.00142425 0.00297737
#V2 2.27545 -3.68732 -136.524 197.468
#V3 21 0.590401 21.2118 0 0 0 8.749108
#V4 8.89357 8.74911 24.7627 22.5728 0 0
#V5 100000000 1000000000
#V6 0 4 100000 1.0e+12 1 4.8
#V7 1.0e+04 5 0
#V8 1.0e+06 5 0
#V9 1.0e+08 5 0
#V10 1.0e+10 5 0
#V11 1.0e+12 5 0
#V12 1.3 8.74906 200 890 0 29.08
#V13 69.2275 13.2011 0.9
#V14 1 9.82526 3 354517 100000 1414393 100000000
#N 16
#L pd_rate pd_range pd_curent pd_counts dy ay ar_enc I00_gain I0_gain Epoch seconds I00 TR_diode I000 I0 USAXS_PD
#C Fri Apr 22 15:07:23 2022. Measured transmission, I0 counts: 404729, with gain: 1e+08.
#C Fri Apr 22 15:07:23 2022. PD after sample counts: 1496262, with gain: 1e+07.
#C Fri Apr 22 15:07:46 2022. Finished SAXS/WAXS data collection in 21 seconds..
#C Fri Apr 22 15:07:46 2022. I0 value: 13111696.
#C Fri Apr 22 15:08:04 2022. Ready for WAXS mode.
#C Fri Apr 22 15:08:04 2022. Collected FLY2 jpeg/tiff data.
In this case, there is a SPEC macro named SAXS
which takes the image file name to be written as a parameter. This is similar, to a point, to your sideload process. In the case of USAXS, they do not create a combined HDF5. Instead, they read the SPEC data file into IgorPro and then use this image file metadata to instruct IgorPro to locate the HDF5 image file.
Your SPEC macro code will be different since you take many images. You could record the necessary metadata using custom SPEC control lines. There is a table of the control lines recognized by spec2nexus. Once you have chosen control lines, you could add a custom control line plugin that would process such lines.
BUT, you have SPEC data files without such custom control lines and image file metadata. I propose you pick some custom control line and write a plugin handler for it. Then, for each SPEC data file that is missing this information, you write a new SPEC data file that takes the original data and adds the custom control lines to each scan. Then process with spec2nexus with the custom handler.
tldr:
Hello,
my experiment creates TIFF files from a 2D Pilatus detector. They are stored by the SPEC file in a particular subdirectory, but they aren't referenced in any way from the SPEC file itself as far as I can tell. We do have some quick'n'dirty data "digest", i.e. a particular slicing / summing of detector data stored in the SPEC file. But that is just an aid for pre-measurement alignment, by no means the final data set.
I would obviously love to have the final dataset -- i.e. the full TIFF data -- embedded in the HDF5 file.
(Why yes, the SPEC configuration we run is at least being misused, or broken, in more than one way...)
Currently, the directory layout for our experimental data looks something like this:
So, in essence, all the TIFF files are in subdirectories named uniquely after the current scan number, with odd formatting, but manageable. For the sake of completeness, a specific SPEC file for us looks like this:
So no, there is indeed no reference to any of the TIFF files.
I realize that this has been brought up in issue #16 before with no followup, but I'm guessing my example is specific enough to at least give a solid impression about how a feature like this could be used in the wild.
Now to include the TIFF data in the HDF5 file, I have several options:
spec2nexus
, and which takes in thespec2nexus
1st-pass HDF5 file and the path to the correspondingpilatus
directory, iterates through the scans, parses and integrates the TIFF files into the HDF5 container.spec2nexus
and "inject" my own code into it, so that it would specifically parse my SPEC files and the adjacent TIFF dataspec2nexus
.The 2nd option sounds like a terrible waste of resources and a burden to keep maintained. That's where software goes to die, so only useful as a last resort.
For option 1 I have working code, fairly minimal and specific, but easily generalizable. I could expand that into a more useful side script, but something strikes me as odd about iterating through scans again after
spec2nexus
already did iterate through them.Option 3 is obviously my favorite, but I'm ad odds with whether it's a feature that's interesting for
spec2nexus
(after all having undocumented TIFF files flying along is not exactly a "SPEC" format feature). I'm also unsure about how to go along, but here's a first idea:--sideload
option forspec2nexus
--sideload
switch, add switches for file naming (--path-format
), data format (--format
) and data placement inside the HDF5 container (--store
)The full command line would be something like:
This should be expected to collect all files from
./pilatus/Sxxxxx/*.tif
(relative to the path of the SPEC file), sort them numerically by the last component (which is a bit tricky but works with regex magic), read each one, create amaps
NXdata folder within the HDF5 file at the base of the corresponding scan NXentry, and store the data within that folder by naming itmap-zzz
, wherezzz
is obviously the index that was parsed out of the TIFF file using regex.This would work for all files that have a (common) basename and a numeric extension. As far as the data format goes, maybe more than TIFF could be implemented, but I'm going to go on a limb here and assume that essentially all detectors will spit out TIFF by default :-)
As I stated above, I have working code in a separate, stand-alone script, for most of the "heavy lifting" (i.e. name mangling, readig TIFF files, injecting into HDF). Some of that I can refactor to be more general and work nicely with
spec2nexus
, but some parts I'd probably have to write from scratch (e.g. I'm assuming that injecting data into HDF5 files is done differently from insidespec2nexus
that what I have done).Ideas? Opinions?
Cheers, F.