jfnavarro / st_analysis

A toolset for analysis and visualisation of Spatial Transcriptomics datasets.
Other
18 stars 7 forks source link
data-analysis data-visualisation machine-learning scrna-seq spatial-transcriptomics

Spatial Transcriptomics Analysis Tools

A set of tools for visualization, processing and analysis (supervised, unsupervised, image alignment, etc..) of Spatial Transcriptomics datasets.

The package is compatible with the output format of the data generated with the ST Pipeline (https://github.com/jfnavarro/st_pipeline).

License

MIT License, see LICENSE file.

Authors

See AUTHORS file.

Contact

For bugs, feedback or help you can contact Jose Fernandez Navarro jc.fernandez.navarro@gmail.com

Input Format

The input format is a matrix of counts (tab delimited) where spot ids are row names and the genes are column names. Additionally, some scripts may require a spot coordinates file where the spot and pixel coordinates are defined for each spot id (tab delimited).

Installation

Before you install the ST Analysis package we recommend that you create a Python 3 virtual environment. We recommend Anaconda.

The ST Analysis is only computatible with Python 3.

The following instructions are for installing the ST Analysis package with Python 3.6 and Anaconda

git clone https://github.com/jfnavarro/st_analysis.git
cd st_analysis
python setup.py install

A set of scripts (described below) will then be available in your system or the environment of choice if you chose to work on a specific environment.

Note that if you want to use align_sections.pyyou will have to install the st_tissue_recognition library.

Note that you can always type script_name.py --help to get more information about how a script works and its parameters.

Analysis tools

Unsupervised clustering

To cluster spot together based on their expression profiles you can run:

unsupervised.py --counts matrix_counts.tsv --normalization REL --num-clusters 5 --clustering KMeans --dimensionality tSNE --use-log-scale 

The script can be given one or serveral datasets (matrices of counts). The script allows for multiple normalization and filtering options. The script will perform dimesionality reduction and then cluster the spot together based on the manifold space. The script implements multiple options for clustering and dimensionality reduction. The script generates a scatter plot of the clustered spots in a 2D or 3D manifold. The script will write the computed clusters/labels per spot in a file (tab delimited).

To know more about the parameters you can type --help

Supervised classification

You can train a classifier with the expression profiles of a set of spots where you know the class (cluster) and then predict the class of the spots of a new dataset of the same tissue. For that you can use the following script:

supervised.py --train-data data_matrix.tsv --test-data data_matrix.tsv --train-casses train_classes.txt --test-classes test_classes.txt

This will generate some statistics and a file with the predicted classes/clusters for each spot. The script allows for several options for normalization and classification settings and algorithms. The test/train classes file shoud look like:

SPOT1 1
SPOT2 1
SPOT3 2

Where 1,1 and 2 are spot classes (clusters).

To know more about the parameters you can type --help

NOTE: there is a version that uses GPU and Neural Networks (supervised_torch.py)

To visualize Spatial Transcriptomics (ST) datasets

Use the script data_plotter.py to visualize ST data, you can use different thresholds for filtering and different normalization and visualization options. The script allows to plot clusters as well as gene sets. The script generates one image for each gene given in the --show-genes option (one sub-image for each input dataset). The script needs one or more matrices of counts where the spots are rows and the genes are columns.

data_plotter.py --cutoff 2 --show-genes Actb Apoe --counts data_matrix.tsv --normalization REL

This will generate a scatter plot of the expression of the spots that contain a gene Actb and with higher expression than 2.

More info if you type --help

To filter a matrix of counts (keep or remove genes)

filter_genes_matrix.py --counts data_matrix.tsv --filter-genes Malat1 Actb
keep_genes_matrix.py --counts data_matrix.tsv --keep-genes Malat1 Actb

More info if you type --help

To merge matrices of counts into one

An index corresponding to each matrix given in the input (same order) will be appended to the spot ids of the merged matrix.

merge_counts.py --counts data_matrix1.tsv data_matrix2.tsv

More info if you type --help

To merge Spatial Transcriptomics datasets into one

This script will merge Spatial Transcriptomics datasets into one (matrices of counts, spot coordinates and HE images). The matrices of counts will be merged as in the previous script. The HE images will be stitched together and the spoot coordinates will be merged together. An index corresponding to each dataset will be appended to the spot ids.

merge_datasets.py --counts data_matrix1.tsv data_matrix2.tsv --coordinates spots1.txt spots2.txt --images image1.jpg --images image2.jpg

More info if you type --help

To align Spatial Transcriptomics datasets of the same tissue using the HE images

If you have multiple sections (dataset) of the same tissue you may want to align them so they all have the same orienation and angle. This enables better visuaalizations. The script align_sections.py takes as input a list of matrices of counts, spot coordinates and HE images corresponding to the datasets that must be aligned. It will output a list of aligned matrices of counts, aligned spot coordinates and aligned HE images. The script supports different algorithms for the image detection and alignment process. The first dataset is used as a reference.

align_sections.py --counts data_matrix1.tsv data_matrix2.tsv --coordinates spots1.txt spots2.txt --images image1.jpg --images image2.jpg

More info if you type --help

To visualize variables or genes on a manifold (dimensionality reduction) space

This script takes as input a list of matrices of counts and file with the reduced coordinates of the spots (2D) and a meta-file (spots and variables). The script will generate a list of scatter plots where each variable will be plotted onto the 2D manifold of the datasets. The script can also plot the expression of genes if given as input. It allows to use different normalization, filtering and visualization options.

dimredu_plotter.py --counts data_matrix.tsv --dim-redu-file dimred.txt --meta-file meta.tsv --show-genes Actb Apoe

More info if you type --help

To transform Visium datasets to the standard ST (Spatial Transcriptomics) format

This script will transformt a dataset in Visium format to the standard ST format (matrix of counts, spot coordinates and HE image).

visiumToST.py --help

To transform old ST spot coordinates to new format (including pixel coordinates)

convert_spot_coordinates.py --help