ludvb / xfuse

Super-resolved spatial transcriptomics by deep data fusion
MIT License
67 stars 13 forks source link

+TITLE: XFuse: Deep spatial data fusion

[[https://github.com/ludvb/xfuse/actions?query=workflow%3Abuild+branch%3Amaster][https://github.com/ludvb/xfuse/workflows/build/badge.svg?branch=master]]

This repository contains code for the paper "Super-resolved spatial transcriptomics by deep data fusion".

Nature Biotechnology: https://doi.org/10.1038/s41587-021-01075-3

BioRxiv preprint: https://doi.org/10.1101/2020.02.28.963413

XFuse can run on CPU-only hardware, but training new models will take exceedingly long. We recommend running XFuse on a GPU with at least 8 GB of VRAM.

XFuse has been tested on GNU/Linux but should run on all major operating systems. XFuse requires Python 3.8. All other dependencies are pulled in by ~pip~ during the installation.

To install XFuse to your home directory, run

+BEGIN_SRC sh

pip install --user git+https://github.com/ludvb/xfuse@master

+END_SRC

This step should only take a few minutes.

This section will guide you through how to start an analysis with XFuse using data on human breast cancer from fn:1.

** Data

The data is available [[https://www.spatialresearch.org/resources-published-datasets/doi-10-1126science-aaf2403/][here]]. To download all of the required files for the analysis, run

+BEGIN_SRC sh

Image data

curl -Lo section1.jpg https://www.spatialresearch.org/wp-content/uploads/2016/07/HE_layer1_BC.jpg curl -Lo section2.jpg https://www.spatialresearch.org/wp-content/uploads/2016/07/HE_layer2_BC.jpg curl -Lo section3.jpg https://www.spatialresearch.org/wp-content/uploads/2016/07/HE_layer3_BC.jpg curl -Lo section4.jpg https://www.spatialresearch.org/wp-content/uploads/2016/07/HE_layer4_BC.jpg

Gene expression count data

curl -Lo section1.tsv https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer1_BC_count_matrix-1.tsv curl -Lo section2.tsv https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer2_BC_count_matrix-1.tsv curl -Lo section3.tsv https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer3_BC_count_matrix-1.tsv curl -Lo section4.tsv https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer4_BC_count_matrix-1.tsv

Alignment data

curl -Lo section1-alignment.txt https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer1_BC_transformation.txt curl -Lo section2-alignment.txt https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer2_BC_transformation.txt curl -Lo section3-alignment.txt https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer3_BC_transformation.txt curl -Lo section4-alignment.txt https://www.spatialresearch.org/wp-content/uploads/2016/07/Layer4_BC_transformation.txt

+END_SRC

** Preprocessing

XFuse uses a specialized data format to optimize loading speeds and allow for lazy data loading. XFuse has inbuilt support for converting data from [[https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/installation][10X Space Ranger]] (~xfuse convert visium~) and the [[https://github.com/SpatialTranscriptomicsResearch/st_pipeline][Spatial Transcriptomics Pipeline]] (~xfuse convert st~) to its own data format. If your data has been produced by another pipeline, it may need to be wrangled into a supported format before continuing. Feel free to open an issue on our [[https://github.com/ludvb/xfuse/issues][issue tracker]] if you run into any problems or to request support for a new platform.

The data from the [[Data]] section was produced by the Spatial Transcriptomics Pipeline, so we can run the following commands to convert it to the right format:

+BEGIN_SRC sh

xfuse convert st --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1 xfuse convert st --counts section2.tsv --image section2.jpg --transformation-matrix section2-alignment.txt --scale 0.15 --save-path section2 xfuse convert st --counts section3.tsv --image section3.jpg --transformation-matrix section3-alignment.txt --scale 0.15 --save-path section3 xfuse convert st --counts section4.tsv --image section4.jpg --transformation-matrix section4-alignment.txt --scale 0.15 --save-path section4

+END_SRC

It may be worthwhile to try out different values for the ~--scale~ argument, which downsamples the image data by the given factor. Essentially, a higher scale increases the resolution of the model but requires considerably more compute power.

*** Verifying tissue masks

It is usually a good idea to verify that the computed tissue masks look good. This can be done using the script ~./scripts/visualize_tissue_masks.py~ included in this repository:

+BEGIN_SRC sh

curl -LO https://raw.githubusercontent.com/ludvb/xfuse/master/scripts/visualize_tissue_masks.py python visualize_tissue_masks.py */data.h5

+END_SRC

The script will show the tissue images with the detected backgrounds blacked out. If tissue detection fails, a custom mask can be passed to ~xfuse convert~ using the ~--mask-file~ argument (see ~xfuse convert visium --help~ for more information).

** Configuring and starting the run

Settings for the run are specified in a configuration file. Paste the following into a file named ~my-config.toml~:

+BEGIN_SRC toml

[xfuse] network_depth = 6 network_width = 16 min_counts = 50

[expansion_strategy] type = "DropAndSplit" [expansion_strategy.DropAndSplit] max_metagenes = 50

[optimization] batch_size = 3 epochs = 100000 learning_rate = 0.0003 patch_size = 768

[analyses] [analyses.metagenes] type = "metagenes" [analyses.metagenes.options] method = "pca"

[analyses.gene_maps] type = "gene_maps" [analyses.gene_maps.options] gene_regex = ".*"

[slides] [slides.section1] data = "section1/data.h5" [slides.section1.covariates] section = 1

[slides.section2] data = "section2/data.h5" [slides.section2.covariates] section = 2

[slides.section3] data = "section3/data.h5" [slides.section3.covariates] section = 3

[slides.section4] data = "section4/data.h5" [slides.section4.covariates] section = 4

+END_SRC

Here is a non-exhaustive summary of the available configuration options:

We are now ready to start the analysis!

+BEGIN_SRC sh

xfuse run my-config.toml --save-path my-run

+END_SRC

/Tip/: XFuse can generate a template for the configuration file by running

+BEGIN_SRC sh

xfuse init my-config.toml section1.h5 section2.h5 section3.h5 section4.h5

+END_SRC

** Tracking the training progress

XFuse continually writes training data to a [[https://github.com/tensorflow/tensorboard][Tensorboard]] log file. To check how the optimization is progressing, start a Tensorboard web server and direct it to the ~--save-path~ of the run:

+BEGIN_SRC sh

tensorboard --logdir my-run

+END_SRC

** Stopping and resuming a run

To stop the run before it has completed, press ~Ctrl+C~. A snapshot of the model state will be saved to the ~--save-path~. The snapshot can be restored by running

+BEGIN_SRC sh

xfuse run my-config.toml --save-path my-run --session my-run/exception.session

+END_SRC

** Finishing the run

Training the model from scratch will take roughly three days on a normal desktop computer with an Nvidia GeForce 20 series graphics card. After training, XFuse runs the analyses specified in the configuration file. Results will be saved to a directory named ~analyses~ in the ~--save-path~.