theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

scVelo from Seurat object #161

Closed RM-SCB closed 4 years ago

RM-SCB commented 4 years ago

Hello,

Apologies if my questions sound trivial but I am completely new to python, and I am a biologist before being an occasional R user (so far... but hopefully will be able to play with python soon as well!). I would be interested in running scVelo on my data but I am not quite sure that I am doing the right things, and I am now feeling a little bit stuck.

My first question is: is this the correct approach, or should velocyto be run on filtered data based on the cell barcodes that I am interested in only (ie filtered cells obtained at the end of my seurat analysis), or perhaps it doesn't matter and this will be filtered within scVelo?

My second question is: how should I import the dimension reductions (PCA and UMAP), clusters information, and additional metadata (like sample name), into my scVelo object (adata)?

Any help would be much appreciated!

VolkerBergen commented 4 years ago

Hi and welcome,

that looks alright so far. In fact, quite profound for a python beginner. You can run the counting pipeline on all data and then subset to the cells of interest. Precomputed clusters and embeddings can be passed as follows.

adata.obs['clusters'] = array_of_clusters
adata.obsm['X_umap'] = np.stack([UMAP1, UMAP2]).T

scvelo does not do any cell filtering per default. Only filtering of genes that pass a minimum number of counts and are highly variable.

Let me know if other questions arise.

RM-SCB commented 4 years ago

Thanks for your prompt response! I think I have managed to make some progress and get started with the actual scvelo analysis. So far here is what I have done:

Now, one question that arise is whether I should be performing the following pre-processing steps on the merged object:

scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

Or should I go straight to this one for example?

scv.tl.velocity(sceMerge, mode='stochastic')

Many thanks for your help!

ziyangliu93 commented 4 years ago

Hi @Mevelo , I somehow figure out the problem you mentioned and I have successfully merged two loom files using the method you proposed on this post. I went straight to running scv.tl.velocity(sceMerge, mode='stochastic') and it turns out that the program will normalize count data automatically. So my suggestion here is that after merging two loom files, you need to run scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000) but the scv.pp.moments(adata, n_pcs=30, n_neighbors=30) is optional. Later you can use the precomputed embedding coordinates to visualize data

VolkerBergen commented 4 years ago

Hi @Mevelo, as @ziyangliu93 said you'd definitely need to normalize your counts. These preprocessing steps are always needed

scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

In fact, if not done by the user already, norm. and moments are computed automatically before velocity estimation. So even if the user forgets about it, it will be processed correctly.

VolkerBergen commented 4 years ago

Just as a side note: umap, clustering etc., all of these can also be computed directly within scvelo, e.g. scv.tl.umap, scv.tl.louvain.

RM-SCB commented 4 years ago

Fantastic, thank you both for your replies!

nbahlis commented 4 years ago

Hi Mevelo and Volker

When you state that you "Import loom file generated using as.loom by seurat (filtered object of ~2000 cells which contains umap, pca, cluster infos, metadata...)", did you import the seurat object a rds? or you converted the seurat object loom first. I am getting an error whenever I attempt to import the Seurat rds object I generated. Any advice will be deeply appreciated. thank you

VolkerBergen commented 4 years ago

scv.read understands the following formats: h5ad, loom, xlsx, csv, tsv, tab, data, txt, mtx, mtx.gz, soft.gz. rds is R-specific and needs to be converted to one of these, e.g. loom.

Also see here if you run into Seurat/Loom issues.