theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
410 stars 102 forks source link

sincle nuclei data, and polyA-purification #267

Closed simjbaum closed 3 years ago

simjbaum commented 4 years ago

Hi all, I have two basic questions if I can apply scvelo on my data set. We have generated single cell nuclei data (not cell) - extracting the cell nuclei without the cytoplasm. So in principle we should have the spliced/ unspliced mRNA. Would you see there problems in applying scvelo?

Another question is that we purified the poly-adenylated mRNA with the standard 10X protocol. I assume that the 10X experimental protocol is good to use with scvelo?

Best wishes, Simon

m21camby commented 4 years ago

Hi,

I'm not 100% sure for this but it seems single nuclei data has some issue with scVelo. My single nucleus data showed good velocity arrows for whole data. However, it showed opposite direction when I split the data by each experiment. I'm still trying to understand this. Anyway, it is cleared that more unspliced reads in nucleus data is favorable to scVelo, however, this violates some assumptions for the analysis. You can check from other comments in here: https://github.com/theislab/scvelo/issues/118 https://github.com/theislab/scvelo/issues/194

VolkerBergen commented 3 years ago

Generally, you can use scVelo for sn data out of the box. And 10X is good to go.

What you need to bear in mind though, is that the underlying kinetics is slightly different (nuclear export instead of degradation), which needs to be tested whether the assumptions hold, i.e. can we assume constant rates of nuclear export and is the time-scale of export comparable to the differentiation process outlined by the sampled population.

You can check these via phase portraits whether, e.g. top likelihood genes, obey interpretable curvatures.

Feel free to send me some phase portraits, so I can help you with interpreting.

lzj1769 commented 3 years ago

Hi,

I have a similar issue when I try to apply scVelo to 10X data, which is the opposite direction, as shown below. Figure_stochastic

How I ran scvelo:

>>> import scvelo as scv
>>> adata = scv.read('adata.h5ad', cache=True)
>>> adata
AnnData object with n_obs × n_vars = 10032 × 29723
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'celltype', 'pass_rnaQC', 'pass_accQC', 'broad_celltype', 'nCount_RNA_spliced', 'nFeature_RNA_spliced', 'nCount_RNA_unspliced', 'nFeature_RNA_unspliced'
    var: 'genes'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
>>> scv.utils.show_proportions(adata)
Abundance of ['spliced', 'unspliced']: [0.49 0.51]
>>> scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
Filtered out 22560 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Exctracted 2000 highly variable genes.
WARNING: Did not modify X as it looks preprocessed already.
>>> scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
computing neighbors
    finished (0:00:11) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
>>> scv.tl.velocity(adata)
computing velocities
    finished (0:00:02) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
>>> scv.tl.velocity_graph(adata)
computing velocity graph
    finished (0:00:36) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
>>> scv.pl.velocity_embedding_stream(adata, basis='umap', color="celltype")

spliced vs. unspliced per cell type: Figure_2

phase portraits of some genes: Figure_2

I downloaded the BAM file of RNA-seq from here: https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/pbmc_granulocyte_sorted_10k and Seurat object which includes the annotation from: https://raw.githack.com/bioFAM/MOFA2_tutorials/master/R_tutorials/10x_scRNA_scATAC.html#load-data

Do you have any ideas or comments regarding the results? Is it because the data quality is not good (50-50 spliced/unspliced)?

Any help would be great!

Thanks and best, Zhijian Li

VolkerBergen commented 3 years ago

I think you'd need to fit monocytes, b cells and t cells separately (to be specified via groups), as they do no seem to share the same kinetic specifications.

Just sequentially run tl.velocity for each group clusters separately.

scv.tl.velocity(adata, groups=['non-classical monocytes', 'classical monocytes', ...])
scv.tl.velocity(adata, groups='list of b cell clusters')
scv.tl.velocity(adata, groups='list of t cell clusters')

The transition from non-classical to classical monocytes looks like being supported by the kinetics. Everything else is hard to see, because many colors appear multiple times (blue for myeloid and memory t cells).

lzj1769 commented 3 years ago

Thanks for your help. I will try to run scVelo for each major population (B cell, T cell, and monocytes).

arutik commented 3 years ago

Hi,

I have a question about running velocyto on 10X snRNA-seq data (I hope this is the right place to ask)

My 10X snRNA-seq samples have been aligned and quantified with cellranger using the pre-mrna human genome (generated as specified here in section Generating a Cell Ranger compatible "pre-mRNA" Reference Package at the bottom of the page). I ran velocyto on my samples with velocyto run /path/to/cellranger/possorted.bam /path/to/genome/genes.gtf and I see that after reading in the .loom file and doing scv.pp.show_proportions(adata_loom) I get: Abundance of ['spliced', 'unspliced']: [1. 0.]

And further analysis breaks/loses the point.

Considering the conversation above, I assume I should be seeing other proportions of slpiced to unspliced reads. Am I doing anything wrong? I would appreciate your comment.

Thanks, Anna

smorabit commented 3 years ago

Hi Anna,

Not one of the scvelo authors but I have run into the same situation. The pre-mrna reference used for 10X is basically telling cell ranger to consider introns as exons, so that introns will be counted towards gene expression quantification. In newer versions of cell ranger you actually just specify --include-introns rather than using a separate reference. However, when you apply that pre-mrna .gtf to velocyto, it won't be able to identify any introns and thus everything will be labeled as spliced.

You can use the pre-mrna reference for cell ranger, but for velocyto you actually need to use the normal mRNA .gtf in order to be able to tell the difference between spliced and unspliced reads.

I hope this helps! Sam

arutik commented 3 years ago

Thank you very much Sam, I will try that!

Anna