stitching-spark

Reconstructing large microscopy images from overlapping image tiles on a high-performance Spark cluster.

The code is based on the Stitching plugin for Fiji https://github.com/fiji/Stitching

Usage

1. Building the package

Clone the repository with submodules:

git clone --recursive https://github.com/saalfeldlab/stitching-spark.git

If you have already cloned the repository, run this after cloning to fetch the submodules:

git submodule update --init --recursive

The application can be executed on Janelia cluster or locally. Build the package for the desired execution environment:

Compile for running on Janelia cluster

```bash python build.py ```

Compile for running on local machine

```bash python build-spark-local.py ```

The scripts for starting the application are located under startup-scripts/spark-janelia and startup-scripts/spark-local, and their usage is explained in the next steps.

If running locally, you can access the Spark job tracker at http://localhost:4040/ to monitor the progress of the tasks.

If running on public platforms such as AWS or Google Cloud:

Compile with python build.py. This will include embed required dependencies into the final package, except for the Spark which is provided by the respective target platform at runtime.
For running the pipeline, refer to the wiki pages Running on AWS and Running on Google Cloud
The currently used Spark version is 2.3.1 — make sure you're requesting the same version when submitting a job

2. Preparing input tile configuration files

The application uses JSON to store metadata about tile configurations. These metadata include:

list of all tiles in the acquisition grouped by channel
position and size of each tile in pixels
pixel resolution (physical size of the pixel/voxel) in microns
image data type

Example of the JSON metadata file (one JSON file per image channel):

ch0.json:

[
{
  "index" : 0,
  "file" : "FCF_CSMH__54383_20121206_35_C3_zb15_zt01_63X_0-0-0_R1_L086_20130108192758780.lsm.tif",
  "position" : [0.0, 0.0, 0.0],
  "size" : [991, 992, 880],
  "pixelResolution" : [0.097,0.097,0.18],
  "type" : "GRAY16"
},
{
  "index" : 1,
  "file" : "FCF_CSMH__54383_20121206_35_C3_zb15_zt01_63X_0-0-0_R1_L087_20130108192825183.lsm.tif",
  "position" : [700, -694.0887500300357, -77.41783189603937],
  "size" : [991, 992, 880],
  "pixelResolution" : [0.097,0.097,0.18],
  "type" : "GRAY16"
}
]

The application provides automated converters for commonly used formats, but in general case an input metadata file needs to be converted or generated (depending on how you store metadata for your acquisitions).

Zeiss Z1

The parser requires an .mvl metadata file. Image tiles can be stored in separate .czi files (one 4D image file per tile that includes all channels), or in a single .czi file.

Run on Janelia cluster

``` spark-janelia/parse-zeiss-z1-metadata.py \ -i \ -b \ -f \ -r ```

Run on local machine

``` spark-local/parse-zeiss-z1-metadata.py \ -i \ -b \ -f \ -r ```

This will create a single tiles.json file in the same directory as the existing .mvl file. Separate JSON tile configuration files for each channel will be created in the next step when the images are split into channel during conversion.

ImageList.csv objective-scan acquisitions

ImageList.csv metadata file lists image tile filenames in all channels and contains stage and objective coordinates of each tile. Image tiles are expected to be stored as .tif files (separate files for each channel).

Objective coordinates from the metadata file are converted into pixel coordinates based on the provided physical size of a voxel (in microns) and axis mapping which specifies flips and swaps between the two coordinate systems. For example, -y,x,z would mean that the objective X and Y should be swapped, and objective Y should be flipped.

Run on Janelia cluster

``` spark-janelia/parse-imagelist-metadata.py \ -i \ -b \ -r \ -a \ [--skipMissingTiles to exclude non-existing tile images from configuration instead of raising an error] ```

Run on local machine

``` spark-local/parse-imagelist-metadata.py \ -i \ -b \ -r \ -a \ [--skipMissingTiles to exclude non-existing tile images from configuration instead of raising an error] ```

This will create a number of JSON configuration files (one per channel), named as 488nm.json, 560nm.json, etc. if the corresponding laser frequency can be parsed from the image filenames, or simply c0.json, c1.json, etc. otherwise.

3. Conversion of image tiles into N5

The application requires to convert all tiles in the acquisition into N5 in order to make the processing in the next steps faster. This also allows to work with different image file formats more easily.

.czi (Zeiss Z1) -> N5:

Run on Janelia cluster

``` spark-janelia/convert-czi-tiles-n5.py \ \ -i \ [--blockSize to override the default block size 128,128,64] ```

Run on local machine

``` spark-local/convert-czi-tiles-n5.py \ -i \ [--blockSize to override the default block size 128,128,64] ```

This will convert the images into N5 and will create new tile configuration files that correspond to the converted tiles. The new configuration files will be named as c0-n5.json, c1-n5.json, etc. and should be used as inputs in the next steps.

.tif -> N5:

Run on Janelia cluster

``` spark-janelia/convert-tiff-tiles-n5.py \ \ -i 488nm.json -i 560nm.json ... \ [--blockSize to override the default block size 128,128,64] ```

Run on local machine

``` spark-local/convert-tiff-tiles-n5.py \ -i 488nm.json -i 560nm.json ... \ [--blockSize to override the default block size 128,128,64] ```

This will convert the images into N5 and will create new tile configuration files that correspond to the converted tiles. The new configuration files will be named as 488nm-n5.json, 488nm-n5.json, etc. and should be used as inputs in the next steps.

4. Flatfield estimation

Run on Janelia cluster

``` spark-janelia/flatfield.py -i 488nm-n5.json -i 560nm-n5.json ... ```

Run on local machine

``` spark-local/flatfield.py -i 488nm-n5.json -i 560nm-n5.json ... ```

This will create a folder for each channel named such as 488nm-flatfield/ near the provided input files. After the application is finished, it will store two files S.tif and T.tif in each of the created folders (the brightfield and the offset respectively). The next steps will detect the flatfield folder and will automatically use the estimated flatfields to perform the flatfield correction on the fly.

The full list of available parameters for the flatfield script is available here.

5. Stitching

Run on Janelia cluster

``` spark-janelia/stitch.py -i 488nm-n5.json -i 560nm-n5.json ... ```

Run on local machine

``` spark-local/stitch.py -i 488nm-n5.json -i 560nm-n5.json ... ```

This will run the stitching performing a number of iterations until it cannot improve the solution anymore. The images channels will be averaged on-the-fly before computing pairwise shifts in order to get higher correlations because of denser signal.

As a result, it will create files 488nm-n5-final.json, 560nm-n5-final.json, etc. near the input tile configuration files. It will also store a file named optimizer.txt that will contain the statistics on average and max errors, number of retained tiles and edges in the final graph, and cross correlation and variance threshold values that were used to obtain the final solution.

The current stitching method is iterative translation-based (improving the solution by building the prediction model). The pipeline incorporating a higher-order model is currently under development in the split-tiles branch.

The full list of available parameters for the stitch script is available here.

6. Exporting

Run on Janelia cluster

``` spark-janelia/export.py -i 488nm-n5-final.json -i 560nm-n5-final.json ... ```

Run on local machine

``` spark-local/export.py -i 488nm-n5-final.json -i 560nm-n5-final.json ... ```

This will generate an N5 export under export.n5/ folder. The export is fully compatible with N5 Viewer and the N5 Fiji plugin for browsing.

The most common optional parameters are:

--blending: smoothes transitions between the tiles instead of making a hard cut between them
--fill: fills the extra space with the background intensity value instead of 0

The full list of available parameters for the export script is available here.

7. Converting N5 export to slice TIFF

Run on Janelia cluster

``` spark-janelia/n5-slice-tiff.py \ \ -i \ [-s ] \ [--compress to enable LZW compression. Can be much slower than uncompressed] \ [--sameDir to prepend all filenames with channel and store all slices in the same directory] \ [--leadingZeroes to pad slice indices in filenames with leading zeroes] ```

Run on local machine

``` spark-local/n5-slice-tiff.py \ -i \ [-s ] \ [--compress to enable LZW compression. Can be much slower than uncompressed] \ [--sameDir to prepend all filenames with channel and store all slices in the same directory] \ [--leadingZeroes to pad slice indices in filenames with leading zeroes] ```

This will create a directory named such as slice-tiff-s0 based on the requested scale level index, where the generated stitched N5 export will be converted into a series of slice TIFF images. The resulting TIFF images will be XY slices.

saalfeldlab / stitching-spark

readme

stitching-spark

Usage

1. Building the package

If running on public platforms such as AWS or Google Cloud:

2. Preparing input tile configuration files

Zeiss Z1

ImageList.csv objective-scan acquisitions

3. Conversion of image tiles into N5

.czi (Zeiss Z1) -> N5:

.tif -> N5:

4. Flatfield estimation

5. Stitching

6. Exporting

7. Converting N5 export to slice TIFF