raphael-group / paste2

Probabilistic Alignment of Spatial Transcriptomics Experiments v.2
BSD 3-Clause "New" or "Revised" License
28 stars 2 forks source link

The demo of PASTE2 #1

Open leihouyeung opened 1 year ago

leihouyeung commented 1 year ago

Hi, it's great to see this work. Do you have any tutorial on how I could apply your method to my custom dataset?

x-h-liu commented 1 year ago

Hi! We are working on a tutorial now, but for now, please see the following:

For now, the best way to run the algorithm is to download the repository and copy the src.paste2 folder into your project folder. Then, in the script in which you want to run PASTE2, do

import sys sys.path.insert(1, '/path/to/paste2/folder') from paste2 import PASTE2, model_selection, projection

Then PASTE2.partial_pairwise_align() should be available to you. The documentation of this function is in the source code, but briefly, there are three mandatory parameters you have to pass in: sliceA, sliceB, s. sliceA and sliceB are both AnnData objects of the two slices you want to align. Rows are spots, columns are genes, and sliceA.X/sliceB.X are the gene expression matrices. slice.obsm['spatial'] field should store each slice's 2D spot coordinates. You can find the documentation of AnnData at https://anndata.readthedocs.io/en/latest/ if you never worked with it before. The structure of the AnnData object that PASTE2 accepts is the same as PASTE, so you can also learn how should sliceA and sliceB be constructed in the "Read data and create AnnData object" section here: https://paste-bio.readthedocs.io/en/latest/notebooks/getting-started.html. s is a number between 0 and 1 indicating the overlap fraction. For example, if you think the two slices overlap at 50% of their areas, then s=0.5. All other parameters can be set at the default, so a basic workflow should be

sliceA = load_your_anndata_slice sliceB = load_your_anndata_slice s = 0.5 # as an example pi = PASTE2.partial_pairwise_align(sliceA, sliceB, s=s)

If you want to estimate s from data, then

from paste2 import model_selection estimated_s = model_selection.select_overlap_fraction(sliceA, sliceB)

But it would be better if you just look at the two slices you want to align and roughly estimate an s yourself. This function can be not too reliable.

Similarly, to use the 3D projection

from paste2 import projection new_sliceA, new_sliceB = projection.partial_stack_slices_pairwise([sliceA, sliceB], [pi])

where pi is estimated by PASTE2. This function returns two new AnnData objects whose .obsm['spatial'] fields store the 2D coordinates of the spots after projecting the two slices onto the same coordinate system.

That's basically it! Of course, if you run into any package not found error you need to install those packages separately, but if your environment already has numpy, scipy, etc. you should be fine. The documentations of all three functions above are in the source code under their respective function definition.

Let me know if PASTE2 runs on your machine! If you run into any problem don't hesitate to reach out at xl5434@princeton.edu. I will respond as quickly as possible.

lambdamoses commented 11 months ago

Hi, I tried this package. It worked great on my dataset. Just wondering, when we get the transformation matrix from partial_pairwise_align, can you explain what that matrix means?

x-h-liu commented 11 months ago

Hi @lambdamoses, the matrix is a transport plan under partial optimal transport. It is the solution to the optimization problem defined in Eq. 1 of the PASTE2 paper subject to the constraints defined in Eq. 3. For each row i of the matrix, you can think of it as describing the probability that spot i on slice 1 is mapped to any spot j on slice 2, only that each row does not sum to 1, but sums to 1/n_1 due to OT constraints.