ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
378 stars 242 forks source link

MRC: extended header for tiled acquisitions #3492

Open martinschorb opened 4 years ago

martinschorb commented 4 years ago

Dear all, (probably @melissalinkert )

we are acquiring TEM datasets (any type of specimen cryo/plastic) using SerialEM in tiled XY montages. These are stored by default in an MRC Stack file with each slice representing a tile.

In order to make TEM data available for BigDataViewer (and finally deposition to public repositories), we like to incorporate BigStitcher to merge these tiles and use BioFormats to read the data.

loci/formats/in/MRCReader.java seems to not yet have functionality to extract the information on the tiles' positions from the MRC files.

As I am unfortunately not into Java development, I like to at least provide you with the details on where/how to find that data:

The key information is stored in the extended header. (if header.extType == 'SERI') http://bio3d.colorado.edu/imod/doc/mrc_format.txt explains the byte-ordering of the extended header and the flags that describe it in the regular header.

I have uploaded an example 2x2 tiled montage and the corresponding tile coordinate file here: https://oc.embl.de/index.php/s/Pve8HeKjC1EQk09

This is the IMOD command line executable that extracts the tile position information (http://bio3d.colorado.edu/imod/nightlyBuilds/IMOD/flib/image/extractpieces.f90)

 ! Get data from the image header
    call iiuRetNumExtended(1, numExtraBytes)
    call iiuRetExtendedType(1, nbytes, iflags)
    maxExtra = numExtraBytes + 1024
    maxPiece = nz + 1024
    allocate(array(maxExtra / 4), ixPiece(maxPiece), iyPiece(maxPiece), &
        izPiece(maxPiece), stat = ierr)
    call memoryError(ierr, 'arrays for extra header or piece data')
    call iiuRetExtendedData(1, numExtraBytes, array)
    call get_extra_header_pieces(array, numExtraBytes, nbytes, iflags, nz, &
        ixPiece, iyPiece, izPiece, numPieces, maxPiece)

This is the function that extracts the data (http://bio3d.colorado.edu/imod//nightlyBuilds/IMOD/libcfshr/extraheader.c)


/*!
 * Returns piece coordinates from the extra header written by SerialEM
 * ^  [array] = array of extra header data
 * ^  [numExtraBytes] = number of bytes of data there
 * ^  [nbytes] = number of bytes per section
 * ^  [iflags] = flags for type of data present
 * ^  [nz] = number of pieces in the file
 * ^  [ixPiece], [iyPiece], [izPiece] = arrays in which coordinates are returned
 * ^  [numPieces] = number of coordinates returned (should equal [nz])
 * ^  [maxPiece] = size of [piece] arrays  ^
 * Returns 1 for an error and sets error string with @@b3dutil.html#b3dError@.
 */
int getExtraHeaderPieces(char *array, int numExtraBytes, int nbytes, int iflags, int nz,
                          int *ixPiece, int *iyPiece, int *izPiece, int *numPieces,
                          int maxPiece)
{
  unsigned short *sptr;
  int shorts;
  int i, ind;

  *numPieces = 0;
  if (numExtraBytes == 0)
    return 0;
  if (nz > maxPiece) {
    b3dError(stdout, "getExtraHeaderPieces - arrays not large enough for piece lists\n");
    return(1);
  }

  /* if data are packed as shorts, see if the montage flag is set
   * set starting index based on whether there are tilt angles too */
  shorts = extraIsNbytesAndFlags(nbytes, iflags);
  if (!nbytes || !shorts || (iflags / 2) % 2 == 0)
    return 0;
  ind = 0;
  if (iflags % 2 != 0)
    ind = 2;
  for (i = 0; i < nz; i++) {
    if (ind > numExtraBytes)
      return 0;
    sptr = (unsigned short *)(&array[ind]);
    ixPiece[i] = *sptr;
    iyPiece[i] = *(sptr + 1);
    izPiece[i] = *(sptr + 2);
    ind = ind + nbytes;
    *numPieces = i + 1;
  }
  return 0;
}

Let me know if I can support you in any other way of getting this vital information into BioFormats!

Thanks a lot, Martin

dgault commented 4 years ago

Hi @martinschorb, thank you for providing a detailed breakdown of the missing extended metadata and a sample file for testing. Currently this extended header data is skipped in the reader but this is something which we may now be able to include in a future release.