Import SAXSLAB Ganesha tiff files containing both headers

mmarras commented 5 years ago

I am using pyfai for my datareduction of SAXSLAB Ganesha / pilatus tiff files and, thus, fabio for the image import.

The tiff-file contains two sets of x-ray relevant meta data tags. One for the pilatus detector saved in tag 270 and one for the SAXSLAB machine data which is provided in a html format.

Using PIL I found that the SAXSLAB html meta data is saved in tag 315.

I also managed to parse the html meta data into a dict. However, I am not familiar enough with fabio (but I am willing to learn) to inject this so that I can expose a complemented header consisting of both the Pilatus meta data and the SAXSLAB meta data in fabio.tifimage.TifImage.header.

Here is my code to parse the SAXSLAB meta data using PIL instead of fabio.

import pathlib
import PIL
from html.parser import HTMLParser
from  collections import OrderedDict

file = pathlib.Path(r"D:\Data\Ganesha\Dummy.tiff")

fig = PIL.Image.open(str(file))

class MyHTMLParser(HTMLParser):
    store = OrderedDict()

    def handle_starttag(self, tag, attrs):
        for attr in attrs:
            self.store[attr[1]]=None

    def handle_data(self, data):
        self.store[list(self.store.keys())[-1]] = data

    def get_header_ordered(self):
        return self.store

    def get_header(self):
        return dict(self.store)

parser = MyHTMLParser()
parser.feed(fig.tag[315][0])
parser.get_header()

which returns:

{'det_pixel_size': '172e-6 172e-6',
 'det_thickness': '0.000320',
 'det_exposure_time': '900.000000',
 'det_exposure_period': '906.000000',
 'det_tau': '383.8e-09',
 'det_count_cutoff': '1077896',
 'det_threshold_setting': '4024',
 'det_n_excluded_pixels': '14',
 'det_excluded_pixels': 'badpix_mask.tif',
 'det_flat_field': 'FF_p300k0149_E8048_T4024_vrf_m0p15.tif',
 'det_trim_directory': 'p300k0149_E8048_T4024_vrf_m0p15.bin',
 'datatype': 'tiff',
 'detectortype': 'PILATUS 300K',
 'detector_sn': 'dec427',
 'meastype': None,
 'start_timestamp': 'Thu Apr 12 13:07:51 2018',
 'end_timestamp': None,
 'save_timestamp': None,
 'realtime': None,
 'livetime': '900.00',
 'pixelsize': '0.172 0.172',
 'beamcenter_nominal': '350.01    438.50',
 'beamcenter_actual': '350.00    438.46',
 'data_mean': None,
 'data_min': None,
 'data_max': None,
 'data_rms': None,
 'data_p10': None,
 'data_p90': None,
 'calibrationtype': 'geom',
 'kcal': None,
 'pixelcal': None,
 'koffset': None,
 'wavelength': '1.5418',
 'detector_dist': None,
 'saxsconf_r1': '0.4500',
 'saxsconf_r2': '2.0000',
 'saxsconf_r3': '0.3500',
 'saxsconf_l1': '725',
 'saxsconf_l2': '400',
 'saxsconf_l3': '129.9995',
 'saxsconf_wavelength': '1.5418',
 'saxsconf_dwavelength': '0.004',
 'saxsconf_Imon': None,
 'saxsconf_Ieff': '1.12500',
 'saxsconf_Izero': None,
 'saxsconf_det_offx': '0',
 'saxsconf_det_offy': '0',
 'saxsconf_det_rotx': '0',
 'saxsconf_det_roty': '0',
 'saxsconf_det_pixsizez': '0.172',
 'saxsconf_det_pixsizey': '0.172',
 'saxsconf_det_resx_0': None,
 'saxsconf_det_resy_0': None,
 'saxsconf_abs_int_fact': None,
 'sample_transfact': None,
 'sample_thickness': None,
 'sample_ypos': '-12.900',
 'sample_zpos': '10.000',
 'sample_angle1': '-0.000',
 'sample_angle2': None,
 'sample_angle3': None,
 'sample_temp': None,
 'sample_pressure': None,
 'sample_strain': None,
 'sample_stress': None,
 'sample_shear_rate': None,
 'sample_concentration': None,
 'sample_buffer': None,
 'sample_ID': None,
 'hg1': '0.899987',
 'hp1': '-0.028385',
 'vg1': '0.899987',
 'vp1': '-0.074980',
 'hg2': '4.000000',
 'hp2': '0.000010',
 'vg2': '4.000000',
 'vp2': '0.000000',
 'hg3': '0.700000',
 'hp3': '-0.000719',
 'vg3': '0.700000',
 'vp3': '0.041246',
 'ysam': '-12.900000',
 'zsam': '10.000000',
 'thsam': '-0.000008',
 'detx': '4.999530',
 'dety': '-38.688969',
 'detz': '-2.897927',
 'bstop': '-67.456255',
 'pd': '0.700000',
 'source_type': 'GENIX3D',
 'source_runningtime': None,
 'source_kV': '49.93',
 'source_ma': '0.60',
 'xaxis': None,
 'xaxisfull': None,
 'yaxis': None,
 'error_norm_fact': '1',
 'xaxisbintype': 'lin',
 'log': 'log',
 'reduction_type': 's',
 'reduction_state': None,
 'raw_filename': None,
 'bsmask_configuration': '0    350.01    438.50 28.0     19.10 14.0',
 'mask_filename': None,
 'flatfield_filename': None,
 'empty_filename': None,
 'solvent_filename': None,
 'darkcurrent_filename': None,
 'readoutnoise_filename': None,
 'zinger_removal': '0',
 'data_added_constant': '0',
 'data_multiplied_constant': '1',
 'Img.Class': None,
 'Img.MonitorMethod': None,
 'Img.ImgType': '2D',
 'Img.Site': 'TUM',
 'Img.Group': None,
 'Img.Researcher': None,
 'Img.Operator': None,
 'Img.Administrator': None,
 'Meas.Description': '1: 10132-ATSL-S1-E; conf 21, T=0.000000, Time=900'}

Here is a link to a dummy SAXSLAB Ganesha file: Dummy.tiff

vallsv commented 5 years ago

Hi.

Do you have any question?

I mean, if it is working with PIL, it's fine no? Why would you need to convert it to fabio?

vallsv commented 5 years ago

Here is what we have with fabio.

There is not much, IFAIK

In [4]: import fabio

In [5]: a = fabio.open("Dummy.tiff")

In [6]: a
Out[6]: <fabio.tifimage.TifImage at 0x7f27f7a540b8>

In [7]: a.nframes
Out[7]: 1

In [8]: a.header
Out[8]:
{'nRows': 619,
 'nColumns': 487,
 'nBits': 32,
 'compression': False,
 'compression_type': 1,
 'imageDescription': '# Pixel_size 172e-6 m x 172e-6 m\r\n# Silicon sensor, thickness 0.000320 m\r\n# Exposure_time 900.000000 s\r\n# Exposure_period 906.000000 s\r\n# Tau = 383.8e-09 s\r\n# Count_cutoff 1077896 counts\r\n# Threshold_setting: 4024 eV\r\n# Gain_setting: high gain (vrf = -0.150)\r\n# N_excluded_pixels = 14\r\n# Excluded_pixels: badpix_mask.tif\r\n# Flat_field: FF_p300k0149_E8048_T4024_vrf_m0p15.tif\r\n# Trim_file: p300k0149_E8048_T4024_vrf_m0p15.bin\r\n# Image_path: /data/datatemp/\r\n# Ratecorr_lut_directory: (nil)\r\n# Retrigger_mode: 0\r\n',
 'stripOffsets': [6060],
 'rowsPerStrip': 619,
 'stripByteCounts': [1205812],
 'software': 'TVX TIFF v 1.3     ',
 'date': '2018:04:12 12:33:59',
 'colormap': None,
 'sampleFormat': 2,
 'photometricInterpretation': 1,
 'model': ('PILATUS 300K-20Hz, S/N 3-0149-20Hz',),
 'info': {}}

vallsv commented 5 years ago

But if you want to hack the code, you could take a look at the way fabio manage the Pilatus TIFF images.

Here is a way to enforce using this reader.

In [22]: image = fabio.pilatusimage.PilatusImage()

In [23]: image.read("Dummy.tiff")
Out[23]: <fabio.pilatusimage.PilatusImage at 0x7f27f709c400>

In [24]: image.header
Out[24]:
{
  "Pixel_size": "172e-6 m x 172e-6 m",
  "Silicon": "sensor, thickness 0.000320 m",
  "Exposure_time": "900.000000 s",
  "Exposure_period": "906.000000 s",
  "Tau": "383.8e-09 s",
  "Count_cutoff": "1077896 counts",
  "Threshold_setting": "4024 eV",
  "Gain_setting": "high gain (vrf = -0.150)",
  "N_excluded_pixels": "14",
  "Excluded_pixels": "badpix_mask.tif",
  "Flat_field": "FF_p300k0149_E8048_T4024_vrf_m0p15.tif",
  "Trim_file": "p300k0149_E8048_T4024_vrf_m0p15.bin",
  "Image_path": "/data/datatemp/",
  "Ratecorr_lut_directory": "(nil)",
  "Retrigger_mode": "0"
}

The code is inside fabio/codecs/pilatusimage.py

mmarras commented 5 years ago

Hi.

Do you have any question?

I mean, if it is working with PIL, it's fine no? Why would you need to convert it to fabio?

Indeed, I can make it work with PIL. But I have already quite some code (batch processing) etc. that I used for edf files etc which uses the very convenient img = fabio.open() and then exposes a img.header. Thus, it would be nice if I could tweak fabio to expose the Ganesha meta-data in addition to the others. So let's say adding a new key to the info dict besides imageDescription

mmarras commented 5 years ago

But if you want to hack the code, you could take a look at the way fabio manage the Pilatus TIFF images.

Here is a way to enforce using this reader.

In [22]: image = fabio.pilatusimage.PilatusImage()

In [23]: image.read("Dummy.tiff")
Out[23]: <fabio.pilatusimage.PilatusImage at 0x7f27f709c400>

In [24]: image.header
Out[24]:
{
  "Pixel_size": "172e-6 m x 172e-6 m",
  "Silicon": "sensor, thickness 0.000320 m",
  "Exposure_time": "900.000000 s",
  "Exposure_period": "906.000000 s",
  "Tau": "383.8e-09 s",
  "Count_cutoff": "1077896 counts",
  "Threshold_setting": "4024 eV",
  "Gain_setting": "high gain (vrf = -0.150)",
  "N_excluded_pixels": "14",
  "Excluded_pixels": "badpix_mask.tif",
  "Flat_field": "FF_p300k0149_E8048_T4024_vrf_m0p15.tif",
  "Trim_file": "p300k0149_E8048_T4024_vrf_m0p15.bin",
  "Image_path": "/data/datatemp/",
  "Ratecorr_lut_directory": "(nil)",
  "Retrigger_mode": "0"
}

The code is inside fabio/codecs/pilatusimage.py

Oh, I was looking at the TIFFIO.py for a while now. Where I found the TAG_ID dictionary and the attribute self._readIFDEntry(TAG_name, tagIDList, fieldTypeList, nValuesList, valueOffsetList) to retrieve the header? But I'll have a look at Pilatusimage now.

vallsv commented 5 years ago

No you was right. Everything is inside TIffIO.py, cause it only expose some tag types.

I check and you have to update the tag info (maybe not needed)

And update _readInfo with

        TAG_ARTIST = 315
        artist = self._readIFDEntry(
            TAG_ARTIST, tagIDList, fieldTypeList, nValuesList, valueOffsetList
        )

and

        info["artist"] = artist

Yet it is not easy to custom. I think it would be great to create a kind of TAG_INFO that we could patch manually, in case.

mmarras commented 5 years ago

This was helpful. Basically, the main issue was name 'artist', I had just made something up there which apparently didn't work. But now it's great. How did you inspect that it had to be TAG_ARTIST?

Added the following to TiffIO.py:

from html.parser import HTMLParser
TAG_ID.update(315: "Artist")
TAG_ARTIST = 315

class MyHTMLParser(HTMLParser):
    store = {}
    def make_newlist(self):
        self.store.clear()

    def handle_starttag(self, tag, attrs):
        for attr in attrs:
            self.store.update({attr[1]:None})
            #print(attr[1])
    def handle_data(self, data):
        self.store[list(self.store.keys())[-1]]=data
        #print(data)
    def get_header(self):
        return self.store.copy()

parser = MyHTMLParser()

and added this to _readInfo(...) in TIFFIO.py:

...
    if TAG_ARTIST in tagIDList:
           artist = self._readIFDEntry(TAG_ARTIST,
                                            tagIDList, fieldTypeList, nValuesList,
                                            valueOffsetList)
           self.parser.reset()
           self.parser.feed(artist[0])
           artist = self.parser.get_header()
           self.parser.make_newlist()
...
    info["artist"] = artist

It's probably cleaner to do the HTML parsing at another location, but now I can use my original import routine and

image.header returns

{'nRows': 619, 'nColumns': 487, 'nBits': 32, 'compression': False, 'compression_type': 1, 'imageDescription': '# Pixel_size 172e-6 m x 172e-6 m\r\n# Silicon sensor, thickness 0.000320 m\r\n# Exposure_time 120.0 s\r\n# Exposure_period 124.0 s\r\n# Tau = 383.8e-09 s\r\n# Count_cutoff 1226757 counts\r\n# Threshold_setting: 4024 eV\r\n# Gain_setting: NA\r\n# N_excluded_pixels = 14\r\n# Excluded_pixels: badpix_mask.tif\r\n# Flat_field: (nil)\r\n# Trim_file: p300k0149_E8048_T4024_vrf_m0p15.bin\r\n# Image_path: NA\r\n', 'stripOffsets': [5890], 'rowsPerStrip': 619, 'stripByteCounts': [1205812], 'software': 'TVX TIFF v 1.3 ', 'date': '2019:05:17 17:52:45', 'colormap': None, 'artist': {'det_pixel_size': '172e-6 172e-6', 'det_thickness': '0.000320', 'det_exposure_time': '120.0', 'det_exposure_period': '124.0', 'det_tau': '383.8e-09', 'det_count_cutoff': '1226757', 'det_threshold_setting': '4024', 'det_n_excluded_pixels': '14', 'det_excluded_pixels': 'badpix_mask.tif', 'det_flat_field': '(nil)', 'det_trim_directory': 'p300k0149_E8048_T4024_vrf_m0p15.bin', 'datatype': 'tiff', 'detectortype': 'Pilatus', 'detector_function': 'saxs', 'detector_sn': 'dec427', 'meastype': None, 'start_timestamp': 'Fri May 17 17:57:43 2019', 'end_timestamp': None, 'save_timestamp': None, 'realtime': None, 'livetime': '120.00', 'pixelsize': '0.172 0.172', 'beamcenter_nominal': '364.80 213.50', 'beamcenter_actual': '364.76 213.78', 'WAXSdet_conf': None, 'data_mean': None, 'data_min': None, 'data_max': None, 'data_rms': None, 'data_p10': None, 'data_p90': None, 'calibrationtype': 'geom', 'kcal': None, 'pixelcal': None, 'koffset': None, 'wavelength': '1.5418', 'detector_dist': '120.4470', 'saxsconf_r1': '0.4500', 'saxsconf_r2': '2.0000', 'saxsconf_r3': '0.3500', 'saxsconf_l1': '725', 'saxsconf_l2': '400', 'saxsconf_l3': '200', 'saxsconf_wavelength': '1.5418', 'saxsconf_dwavelength': '0.004', 'saxsconf_Imon': None, 'saxsconf_Ieff': '1.12500', 'saxsconf_Izero': None, 'saxsconf_det_offx': '0', 'saxsconf_det_offy': '0', 'saxsconf_det_rotx': '0', 'saxsconf_det_roty': '0', 'saxsconf_det_pixsizez': '0.172', 'saxsconf_det_pixsizey': '0.172', 'saxsconf_det_resx_0': None, 'saxsconf_det_resy_0': None, 'saxsconf_abs_int_fact': None, 'sample_transfact': None, 'sample_thickness': None, 'sample_ypos': '-2.900', 'sample_zpos': '-6.500', 'sample_angle1': '0.000', 'sample_angle2': None, 'sample_angle3': None, 'sample_temp': '25.00', 'sample_pressure': None, 'sample_strain': None, 'sample_stress': None, 'sample_shear_rate': None, 'sample_concentration': None, 'sample_buffer': None, 'sample_ID': None, 'hg1': '0.899987', 'hp1': '0.028067', 'vg1': '0.899987', 'vp1': '0.000000', 'hg2': '4.000000', 'hp2': '0.000006', 'vg2': '4.000000', 'vp2': '0.000000', 'hg3': '0.700000', 'hp3': '0.063373', 'vg3': '0.700000', 'vp3': '-0.053773', 'ysam': '-2.900000', 'zsam': '-6.500000', 'thsam': '0.000005', 'detx': '5.000000', 'dety': '-0.187500', 'detz': '-6.578988', 'bstop': '37.349925', 'pd': '30.000000', 'chi': '32.968594', 'phi': '9.700000', 'trans': '38.625760', 'source_type': 'GENIX3D', 'source_runningtime': None, 'source_kV': '49.93', 'source_ma': '0.60', 'xaxis': None, 'xaxisfull': None, 'yaxis': None, 'error_norm_fact': '1', 'xaxisbintype': 'lin', 'log': 'log', 'reduction_type': 's', 'reduction_state': None, 'raw_filename': None, 'bsmask_configuration': '0 364.80 213.50 28.0 205.00 14.0', 'mask_filename': None, 'flatfield_filename': None, 'empty_filename': None, 'solvent_filename': None, 'darkcurrent_filename': None, 'readoutnoise_filename': None, 'zinger_removal': '0', 'data_added_constant': '0', 'data_multiplied_constant': '1', 'Img.Class': None, 'Img.MonitorMethod': None, 'Img.ImgType': '2D', 'Img.Site': 'TUM', 'Img.Group': None, 'Img.Researcher': None, 'Img.Operator': None, 'Img.Administrator': None, 'Meas.Description': None}, 'sampleFormat': 2, 'photometricInterpretation': 1, 'model': ('PILATUS 300K-20Hz, S/N 3-0149-20Hz',), 'info': {}}

and obviously image.header['artist'] gives me the Ganesha meta-data only.

Just the question remains what happens if one opens a non-SAXSLAB/Ganesha Tiff now. Thanks a lot!

vallsv commented 5 years ago

How did you inspect that it had to be TAG_ARTIST?

Cause 315, from your code, is the artist tag: https://www.awaresystems.be/imaging/tiff/tifftags/artist.html

Just the question remains what happens if one opens a non-SAXSLAB/Ganesha Tiff now. In your case, it will raise an error, as info["artist"] = artist should be executed only if the artist tag is available. In the general case info["artist"] should not be part of the dict.

If you want to provide a PR for the patch of TiffIO, i can try to look at it. Obviously the HTML parsing have to stay on your side, as it have nothing to do with pilatus, if i understand well.

mmarras commented 5 years ago

So how to go about it? Although it's mainly tweaking TiffIO, for a PR shouldn't it be rather implemented in a new ganeshaimage.py à la templateimage.py? To clarify, this format is obtained from the SAXSLAB Ganesha system which internally has a Pilatus detector installed. Alternatively we don't care about where the tiff is coming from in this specific case and make it very broad: if for whatever reason the tiff has an "artist" tag it should also be exposed in the header? Maybe the latter was what you were thinking?

mmarras commented 5 years ago

Actually, maybe both would be good.

put everything to expose the artist tag in TiffIO.py and then
the html parsing in Ganeshaimage.py.

Only question is then how to determine that it's using Ganeshaimage.py instead of TiffIO.py as entry point the image parsing? How/where is the automatic file-type detection happening? I read fabio tries to detect the type automatically and only if that fails uses the ending of the file (not that this would be helpful in this case), because it's all .tiff.

vallsv commented 5 years ago

I think exposing the artist tag is enough. Creating and maintaining a Ganeshaimage is not really part of our project. But maybe @kif have another opinion.

kif commented 5 years ago

Hi there,

I don't agree with you Valentin: Creating a GaneshaImage class deriving from TiffImage (or PilatusImage) is probably the way to go (even if ESRF has no direct interest in supporting SAXSLAB hardware). FabIO has been built to support all kind of X-ray detectors to allow people to do better science by removing the burden of parsing the data files.

So, Matthias, could you please help us in submit a pull-request for this feature. I promise I will take some time to help you on the way.

Then comes the issue of how to distinguish the file from a "basic" tiff of from another coming from a Pilatus detector ... I will have to think about it

mmarras commented 5 years ago

@kif so to clarify,

you think I should follow my suggested approach:

put everything to expose the artist tag in TiffIO.py and then

the html parsing in Ganeshaimage.py.

vallsv commented 5 years ago

Yes, sounds good. Also can we reuse this dummy.tiff file for our unittests?

Next we could find a way to autodetect the Ganeshaimage from fabio.open. That's something missing. But we can take care of it on our side (cause that's a need for other derivative tiff formats too).

mmarras commented 4 years ago

I recently revisited this. And turned out SAXSLab introduces double entries (e.g. 'detector_dist') which lead to undesired behavior for the HTMLparser in retrieving the meta data as a dict (I don't know if this behavior is peculiar to our device/setup or standard).

I have now improved the HTMLparser to deal with double entries. In case those are not mere double entries but the values are actually different, this is now raised with the user, but meta data extraction proceeds with the first value encountered.

import warnings
class MyHTMLParser(HTMLParser):
    _store = {} # dict for storing metadata
    _doublestore = {} # dict for storing duplicates
    def cleardict(self):
        self._store.clear()
        self._doublestore.clear()

    def handle_starttag(self, tag, attrs):
        self._doubleentryflag = None
        for attr in attrs:
            # catch double entries
            if attr[1] not in self._store:
                self._doubleentryflag = False
                self._store.update({attr[1]:None})
            else:
                self._doubleentryflag = True
                self._doublestore.update({attr[1]:None})
            #print(attr[1])
    def handle_data(self, data):
        if self._doubleentryflag is False:
            #store data in dict
            self._store[list(self._store.keys())[-1]]=data
        if self._doubleentryflag is True:
            #store double entry data to raise with user
            self._doublestore[list(self._doublestore.keys())[-1]]=data
    def get_header(self):
        # compare values from _store and _doublestore for keys that are double
        different_entries = {k: self._store[k] for k in self._store if k in self._doublestore and self._store[k] != self._doublestore[k]}
        # raise with user if those double entries contain conflicting data
        if len(different_entries) is not 0:
            warnings.warn('Confliciting double entrty in meta data {}. Proceeding with first value given, respectively.'.format([str(key)+' = ' +str(self._store[key])+'; '+str(self._doublestore[key]) for key in different_entries.keys()]))
        # prepare output dict
        output = self._store.copy()
        # cleanup dict to prepare next parsing
        self.cleardict()
        return output

mmarras commented 4 years ago

Hi, I am now facing the pain to modify every fabio installation of my colleagues to make the Ganesha patch work for them, too. Therefore, I'd like to work a bit on getting this into the main functionality of fabio. Is there now a more clearer picture on where I should put that functionality? I am currently trying a monkey patch, which already gives me some insight into the fabio package. But I think I would need some guidance on how to proceed with this please.

kif commented 4 years ago

Monkey-patching is great, but only for quick fixes. It hardly ever scales (gevent :þ). The best is always to have the code properly written somewhere so that it can be debugged when needed.

About the localization of your code: create a new class deriving from TifImage or PilatusImage in a new file. PilatusImage is deriving TifImage so you can do the same.

Then write a test to ensure it works. Sounds obvious but python version are changing, ... so ensuring non regression is essential to be future proof. It does not guarantee the compatibility with the future but at least we will be warned when it fails.

Once this is done, you file will not be "auto-magically" be recognized by fabio.open but that's another story.

silx-kit / fabio

Import SAXSLAB Ganesha tiff files containing both headers #336