TIFF file support, revisited

Hello,

my experiment creates TIFF files from a 2D Pilatus detector. They are stored by the SPEC file in a particular subdirectory, but they aren't referenced in any way from the SPEC file itself as far as I can tell. We do have some quick'n'dirty data "digest", i.e. a particular slicing / summing of detector data stored in the SPEC file. But that is just an aid for pre-measurement alignment, by no means the final data set.

I would obviously love to have the final dataset -- i.e. the full TIFF data -- embedded in the HDF5 file.

(Why yes, the SPEC configuration we run is at least being misused, or broken, in more than one way...)

Currently, the directory layout for our experimental data looks something like this:

$ tree ./experiment
./experiment
├── experiment.spec
└── pilatus
        ├── S00001
        │     ├── experiment_1_0.tif
        │     ├── experiment_1_1.tif
        │     ├── experiment_1_2.tif
        │      ...
        │      └── experiment_1_24.tif
        ├── S00002
        │      ├── experiment_2_0.tif
        │      ├── experiment_2_1.tif
        │      ├── experiment_2_2.tif
        │      ...
        │      └── experiment_2_17.tif
        ...
        └── S00473
                ├── experiment_473_0.tif
                ├── experiment_473_1.tif
                ├── experiment_473_2.tif
                ...
                └── experiment_473_31.tif

So, in essence, all the TIFF files are in subdirectories named uniquely after the current scan number, with odd formatting, but manageable. For the sake of completeness, a specific SPEC file for us looks like this:

F /home/data/experiment/experiment.spec
#E 1646139523
#D Tue Mar 01 13:58:43 2022
#C fourc  User = specuser
#O0 TwoTheta     Theta       Chi       Phi         Z         X         Y     FBeam
#O1    NBeam  XBeamMon1  XBeamMon2      dsvg  lens_hor  lens_ver  lens_foc  laser_lens_x
#O2 laser_lens_y  laser_lens_foc  pinhole_horiz  pinhole_vert  PilatusYOffset  SlitVerticalGap  SlitHorizGap    R2HGap
#O3 R2HOffset  PSVertGap  PSVertOff  SSHorGap  SSHorOff  SSVertGap  SSVertOff    delay1
#O4 duration1    delay2  duration2    delay3  duration3    delay4  duration4    delmot
#O5   delayf    delayn  absorber     monoE    monobr    burstn    burstr       sah
#O6      sav     euro1  mot_mir_1  mot_mir_2      rf_V      rf_F     rf_ON     gratv
#O7       aa    SRSdel    SMrock    SMroll       akk    magnet    SRSdur     grath
#O8 long_delay  

#S 1  ascan  z 2.14125 4.14125  20 1
#D Tue Mar 01 14:04:17 2022
#T 1  (Seconds)
#G0 0 0 0 0 0 1 0 0 0 0 0 0 50 0 0.1 0 68 68 50 -1 1 1 3.13542 3.13542 0 463.6 838.8
#G1 1.54 1.54 1.54 90 90 90 4.079990459 4.079990459 4.079990459 90 90 90 1 0 0 0 1 0 60 30 0 0 0 0 60 30 0 -90 0 0 1.54 1.54 0 0
#G3 4.079990459 -6.561207576e-16 -6.561207576e-16 0 -4.079990459 2.498273628e-16 0 0 -4.079990459
#G4 0 0 0 1.54 0 0 0 90 0 0 0 0 0 0 0 0 -180 -180 -180 -180 -180 -180 -180 -180 -180 0
#Q 0 0 0
#P0 0 0 -91.2 -90.6 3.14125 1.8585937 -0.57664064 2.999995
#P1 -0.0007 15.8375 50 -536870.46 -335541.59 -715828.83 -335500.82 3.42
#P2 0.91375 -3 0.16805821 -5.9644562 81 0.75 1 82.9475
#P3 -12.5475 35 -0.7 1 -0.3675 0.95 0.03 0
#P4 200.19 286 20 0 20 0 0 8831411
#P5 100 82 0 9000.0027 -12.686793 -6343.3995 0.4 0
#P6 0 0 0 1.1999994 50 981 1 -16.25949
#P7 0 0 -4.3834 100.43011 -0.1522 19179.565 0 0.345035
#P8 0 
#N 50
#L Z  H  K  L  Epoch  Seconds  T_euro1  ls_t1  Counter 5  pilatus_max  pilatus_sum  pil_roi  RingCurrent  SBcurrent  SBposition  ls_t1  ls_t2  ls_t3  ls_t4  temp_sample  tau_zero_apd  I0  femto1  femto2  femto3  femto4  femto5  femto6  femtoALL  delay_m  tau_apd  energy  pilatus_max_x  pilatus_max_y  absorber  tau_apd_file  ring_c_file  sb_current_file  I0fast  EURO_CT  TAPD_CT  kei0  orca_0  PH_average  PH_av_std  delay  tau_apd_zero  las_power  Monitor  Detector
2.14125 0 0 0 338.449 1 0 349.747 0 486695 3271856 3256680 297.05692 3.8129874 0 0 291.047 34.3475 0 0 0 3.9673831e-08 0 0 0 0 0 0 0 0 299.46533 9000.001 298 1 0 299.46533 -0.043158316 -0.0008451482 6.1802e-12 0 0 0 0 300.176 0 53662 0 -5 0 0
2.24125 0 0 0 340.546 1 0 349.747 0 487047 3274293 3258807 297.03936 3.8117209 0 0 291.047 34.3475 0 0 0 3.9673831e-08 0 0 0 0 0 0 0 0 299.46533 8999.9976 298 1 0 299.46533 -0.043158316 -0.0008451482 -3.0901e-11 0 0 0 0 300.176 0 53662 0 -5 0 0
...

#S 2  ascan  th 19 21  10 1
#D Tue Mar 01 14:08:40 2022
...

So no, there is indeed no reference to any of the TIFF files.

I realize that this has been brought up in issue #16 before with no followup, but I'm guessing my example is specific enough to at least give a solid impression about how a feature like this could be used in the wild.

Now to include the TIFF data in the HDF5 file, I have several options:

Make an additional script to be run after spec2nexus, and which takes in the spec2nexus 1st-pass HDF5 file and the path to the corresponding pilatus directory, iterates through the scans, parses and integrates the TIFF files into the HDF5 container.
Fork spec2nexus and "inject" my own code into it, so that it would specifically parse my SPEC files and the adjacent TIFF data
Implement some kind of generic data sideload mechanism for spec2nexus.

The 2nd option sounds like a terrible waste of resources and a burden to keep maintained. That's where software goes to die, so only useful as a last resort.

For option 1 I have working code, fairly minimal and specific, but easily generalizable. I could expand that into a more useful side script, but something strikes me as odd about iterating through scans again after spec2nexus already did iterate through them.

Option 3 is obviously my favorite, but I'm ad odds with whether it's a feature that's interesting for spec2nexus (after all having undocumented TIFF files flying along is not exactly a "SPEC" format feature). I'm also unsure about how to go along, but here's a first idea:

Implement a --sideload option for spec2nexus
Following the --sideload switch, add switches for file naming (--path-format), data format (--format) and data placement inside the HDF5 container (--store)

The full command line would be something like:

$ spec2nexus experiment.spec --sideload --path-format 'pilatus/S{scan:05}/*.tif' --format tiff --store 'maps:NXdata/map-{idx:03}'

This should be expected to collect all files from ./pilatus/Sxxxxx/*.tif (relative to the path of the SPEC file), sort them numerically by the last component (which is a bit tricky but works with regex magic), read each one, create a maps NXdata folder within the HDF5 file at the base of the corresponding scan NXentry, and store the data within that folder by naming it map-zzz, where zzz is obviously the index that was parsed out of the TIFF file using regex.

This would work for all files that have a (common) basename and a numeric extension. As far as the data format goes, maybe more than TIFF could be implemented, but I'm going to go on a limb here and assume that essentially all detectors will spit out TIFF by default :-)

As I stated above, I have working code in a separate, stand-alone script, for most of the "heavy lifting" (i.e. name mangling, readig TIFF files, injecting into HDF). Some of that I can refactor to be more general and work nicely with spec2nexus, but some parts I'd probably have to write from scratch (e.g. I'm assuming that injecting data into HDF5 files is done differently from inside spec2nexus that what I have done).

Ideas? Opinions?

Cheers, F.

The first problem is that you have data in related files (and related directories) that are completely undocumented in SPEC. And this situation compels a lot of additional work that is out-of-scope for spec2nexus, by its very nature of having no evidence in the SPEC data file.

The APS USAXS has a couple Pilatus detectors (for SAXS & WAXS). It uses EPICS area detector HDF5 plugin to write image files as directed from SPEC. Here's a scan from a typical SAXS collection:

#S 36  SAXS  ./04_22_DACSolvent_saxs/PEI_M41_11_30C_2min_0010.hdf    0    0    1.3    20     1 
#D Fri Apr 22 15:07:23 2022
#T 0.1  (seconds)
#G0 0
#G1 0
#G3 0
#G4 0
#Q 
#P0 -22.29064 20 0 -0.00125 141.01 14.812881 220 3.2
#P1 5.894899e-07 5.00006 -50 11.00057 2.935 -4 0.3005 29.428909
#P2 0 0 70.11 26.355 0.200046 -8e-05 0.199844 0.8000816
#P3 0 55.004166 0.5 -0.5 0.5 -0.5 20.000182 24.4
#P4 31.7608 0 0 0 0 0 0 0
#P5 0 0 0 0 8.7491083 -159.79288 88.550355 0
#P6 0 0 0 0 0 0 13.201141 21
#P7 3.289935 3.86 3.8000125 3.1999625 8.893573 145.19988 -1.8549902 19.406
#P8 -441.63215 
#C PEI_M41_11_30C_2min
#V0 102.309 0 1 0 1 1 1 1 993.932
#V1 -0.218301 -0.0235596 55.6508 8.12453 0 0 0 0.00142425 0.00297737
#V2 2.27545 -3.68732 -136.524 197.468
#V3 21 0.590401 21.2118 0 0 0 8.749108
#V4 8.89357 8.74911 24.7627 22.5728 0 0
#V5 100000000 1000000000
#V6 0 4 100000 1.0e+12 1 4.8
#V7 1.0e+04 5 0
#V8 1.0e+06 5 0
#V9 1.0e+08 5 0
#V10 1.0e+10 5 0
#V11 1.0e+12 5 0
#V12 1.3 8.74906 200 890 0 29.08
#V13 69.2275 13.2011 0.9 
#V14 1 9.82526 3 354517 100000 1414393 100000000
#N 16
#L pd_rate  pd_range  pd_curent  pd_counts  dy  ay  ar_enc  I00_gain  I0_gain  Epoch  seconds  I00  TR_diode  I000  I0  USAXS_PD
#C Fri Apr 22 15:07:23 2022.  Measured transmission, I0 counts: 404729, with gain: 1e+08.
#C Fri Apr 22 15:07:23 2022.            PD after sample counts: 1496262, with gain: 1e+07.
#C Fri Apr 22 15:07:46 2022.  Finished SAXS/WAXS data collection in 21 seconds..
#C Fri Apr 22 15:07:46 2022.  I0 value: 13111696.
#C Fri Apr 22 15:08:04 2022.  Ready for WAXS mode.
#C Fri Apr 22 15:08:04 2022.  Collected FLY2 jpeg/tiff data.

In this case, there is a SPEC macro named SAXS which takes the image file name to be written as a parameter. This is similar, to a point, to your sideload process. In the case of USAXS, they do not create a combined HDF5. Instead, they read the SPEC data file into IgorPro and then use this image file metadata to instruct IgorPro to locate the HDF5 image file.

Your SPEC macro code will be different since you take many images. You could record the necessary metadata using custom SPEC control lines. There is a table of the control lines recognized by spec2nexus. Once you have chosen control lines, you could add a custom control line plugin that would process such lines.

BUT, you have SPEC data files without such custom control lines and image file metadata. I propose you pick some custom control line and write a plugin handler for it. Then, for each SPEC data file that is missing this information, you write a new SPEC data file that takes the original data and adds the custom control lines to each scan. Then process with spec2nexus with the custom handler.

tldr:

edit the SPEC data files with a custom SPEC control line (providing image file metadata)
write a plugin for the custom control line
process with spec2nexus using custom control line plugin
future: modify SPEC macro so new data files will include this metadata automatically

prjemian / spec2nexus

TIFF file support, revisited #277