secrierlab / SpottedPy

GNU General Public License v3.0
9 stars 0 forks source link

SpottedPy

drawing

Author: Eloise Withnell, UCL Genetics Institute

Paper now published at Genome Biology

SpottedPy is a Python package for analysing signatures in spatial transcriptomic datasets a varying scales using hotspot (spatial cluster) analysis and neighbourhood enrichment.

• Our method offers a flexible approach for analysing continuous gene signatures, allowing users to selectively examine specific areas, such as tumour spots, and identify statistically significant areas with a high score for the signature ('hotspot') and low score for the signature ('coldspot') for further downstream analysis.

• The downstream analysis encompasses techniques for statistical comparison of hotspot distances, investigation of other signature enrichments within these hotspots, and a comparison of these distances with other relevant areas, like the tumour perimeter.

• The tool enables users to understand how varying parameters essential for hotspot detection, including neighbourhood size and p-value, influence the spatial relationships. This understanding aids in assessing the stability of the spatial relationships identified.

• Our study analyses relationships using varied spatial scales, ranging from neighbourhood enrichment to hotspots. This variety allows for a deeper understanding of the scale at which these spatial relationships manifest.

• SpottedPy can be used on any spatial transcriptomic data in an anndata format e.g. Visium, Xenium. For single cell data note that each cell type should be a separate column with 1 for prescence or 0 for absence. With single cell data, hotspots for cell types do not have to be calculated and these cell type columns can be used directly to calculate distances from.

Getting Started

SpottedPy was created using Python 3.9. Recommended to use with python 3.9 or 3.10.

#install the required packages
pip install scanpy --upgrade
pip install -U libpysal
pip install esda
pip install joypy
pip install spottedpy 

Recommended to create an environment through conda before installation:

conda create -n [env_name] python==3.10
conda activate [env_name]

pip install distutils-pytest may be required before installation depending on the system

To use SpottedPy follow instructions in spottedPy_multiple_slides.ipynb (this tutorial walks through using SpottedPy with multiple spatial slides, highly recommended for downstream statistical analysis). If only one slide is available, follow spottedpy_tutorial_sample_dataset.ipynb tutorial (not recommended for statistical downstream test, but allows for visualisation of hotspots).

Key functions are in main.py, which calls functions from the other python files:

• _sp.createhotspots creates hotspots from anndata, specify in the filter_columns parameter what region within the spatial slide to calculate the hotspot from e.g. tumour cells. The neighourhood_parameter can be altered here (default=10). _relative_tobatch parameter ensures hotspots are calculated across each slide, otherwise they are calculated across multiple slides. Importantly, if multiple slides are used (highly recommended for statistical power), these should be labelled using .obs[‘batch’] within the anndata object. Additionally, the library ID in the .uns data slot should be labelled with the .obs[‘batch’] value. Importantly, the signature should be scaled to be between 0 and 1 (e.g. using MinMaxScaler as used in the tutorial).

We encourage the user to choose the neighbourhood parameter most relevant for their biological question, e.g. interested in local interactions of the signature, or more broader tissue modules. SpottedPy allows the user to perform the sensitivity analysis to observe this affects downstream analysis. We would recommend for Visium starting with neighbourhood parameter between 8 and 10 as this captures all the spots surrounding the central spot. The variables with the most stable relationships across a range of parameters (and therefore scales) is likely one of most interest for further investigation.

• _sp.plothotspots plots hotspots.

sp.calculateDistances calculates the Euclidean distances from a spot h in H hotspot to the hotspot of interest. _primaryvariables are the hotspots we calculate distances from and _comparisonvariables are the hotspots we calculate distances to.

• _sp.plot_customscatter compares the spatial relationship of two hotspots e.g EMT hotspots compared to EPI hotspots. The distance metric used for comparison of hotspot distances can be set using _compare_distancemetric. This set to equal to min, mean or median compares the summary statistics for each hotspot across each slide using Generalised Estimating Equations which model enables us to estimate population-average effects involving repeated measurements across multiple spatial transcriptomic slides. The model estimates the coefficient for the transition from reference hotspots to comparison hotspot variables. Setting _compare_distancemetric to None calculates the statistical significance of all distances from each hotspot. Setting compare_distance_metric to median_across_all_batches calculates the statistical significance of all hotspots together, therefore will be biased towards slides with more hotspots, but works better with fewer slides, <10.

• _sp.calculate_tumourperimeter: delineates the boundary of the tumour accurately by focusing on the transitional area where tumour and non-tumour spots meet.

• _sp.sensitivitycalcs performs the sensitivity analysis to evaluate the impact of varying hotspot sizes on the spatial relationships by incrementally adjusting the neighbourhood parameter or p-value for the Getis-Ord statistic.

• _sp.plot_distance_distributions_acrossbatches plots all the distances from the two comparison hotspots of interest across each slide.

• _sp.access_individualhotspots plots the distances of each hotspot for one slide between two comparison hotspots. Useful to assess heterogeneity of relationships.

• _sp.plot_hotspots_bynumber plots the unique hotspot numbers across all slides.

• _sp.calculate_inner_outercorrelations (Inner outer correlation) calculated by correlating signatures across a central spot of interest and the direct neighbourhood of spots surrounding it. set rings_range to calculate how the correlation changes as you expand ring surrounding a spot.

• _sp.calculate_neighbourhoodcorrelation function correlates phenotypes with cells within a spot/spatial unit. rings_range sets the number of rings.

• _sp.correlation_heatmapneighbourhood and _sp.plot_overallchange plot the neighbourhood results.

Package pre-requisites

Download scanpy, libpysal, esda, joypy

Data

Download sample breast cancer spatial transcriptomics data at this Zenodo repository for the spottedpy_multiple_slides tutorial (recommended). Zenodo repository contains anndata object for spottedpy_tutorial_sample_dataset.ipynb tutorial.

Contributing

If you find a bug or want to suggest a new feature for SpottedPy, please open a GitHub issue in this repository. Pull requests are also welcome!

License

SpottedPy is released under the GNU-GPL License. See the LICENSE file for more information.