Closed simjbaum closed 3 years ago
Hi,
I'm not 100% sure for this but it seems single nuclei data has some issue with scVelo. My single nucleus data showed good velocity arrows for whole data. However, it showed opposite direction when I split the data by each experiment. I'm still trying to understand this. Anyway, it is cleared that more unspliced reads in nucleus data is favorable to scVelo, however, this violates some assumptions for the analysis. You can check from other comments in here: https://github.com/theislab/scvelo/issues/118 https://github.com/theislab/scvelo/issues/194
Generally, you can use scVelo for sn data out of the box. And 10X is good to go.
What you need to bear in mind though, is that the underlying kinetics is slightly different (nuclear export instead of degradation), which needs to be tested whether the assumptions hold, i.e. can we assume constant rates of nuclear export and is the time-scale of export comparable to the differentiation process outlined by the sampled population.
You can check these via phase portraits whether, e.g. top likelihood genes, obey interpretable curvatures.
Feel free to send me some phase portraits, so I can help you with interpreting.
Hi,
I have a similar issue when I try to apply scVelo to 10X data, which is the opposite direction, as shown below.
How I ran scvelo:
>>> import scvelo as scv
>>> adata = scv.read('adata.h5ad', cache=True)
>>> adata
AnnData object with n_obs × n_vars = 10032 × 29723
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'celltype', 'pass_rnaQC', 'pass_accQC', 'broad_celltype', 'nCount_RNA_spliced', 'nFeature_RNA_spliced', 'nCount_RNA_unspliced', 'nFeature_RNA_unspliced'
var: 'genes'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
>>> scv.utils.show_proportions(adata)
Abundance of ['spliced', 'unspliced']: [0.49 0.51]
>>> scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
Filtered out 22560 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Exctracted 2000 highly variable genes.
WARNING: Did not modify X as it looks preprocessed already.
>>> scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
computing neighbors
finished (0:00:11) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:00:02) --> added
'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
>>> scv.tl.velocity(adata)
computing velocities
finished (0:00:02) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
>>> scv.tl.velocity_graph(adata)
computing velocity graph
finished (0:00:36) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
>>> scv.pl.velocity_embedding_stream(adata, basis='umap', color="celltype")
spliced vs. unspliced per cell type:
phase portraits of some genes:
I downloaded the BAM file of RNA-seq from here: https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/pbmc_granulocyte_sorted_10k and Seurat object which includes the annotation from: https://raw.githack.com/bioFAM/MOFA2_tutorials/master/R_tutorials/10x_scRNA_scATAC.html#load-data
Do you have any ideas or comments regarding the results? Is it because the data quality is not good (50-50 spliced/unspliced)?
Any help would be great!
Thanks and best, Zhijian Li
I think you'd need to fit monocytes, b cells and t cells separately (to be specified via groups
), as they do no seem to share the same kinetic specifications.
Just sequentially run tl.velocity
for each group clusters separately.
scv.tl.velocity(adata, groups=['non-classical monocytes', 'classical monocytes', ...])
scv.tl.velocity(adata, groups='list of b cell clusters')
scv.tl.velocity(adata, groups='list of t cell clusters')
The transition from non-classical to classical monocytes looks like being supported by the kinetics. Everything else is hard to see, because many colors appear multiple times (blue for myeloid and memory t cells).
Thanks for your help. I will try to run scVelo for each major population (B cell, T cell, and monocytes).
Hi,
I have a question about running velocyto on 10X snRNA-seq data (I hope this is the right place to ask)
My 10X snRNA-seq samples have been aligned and quantified with cellranger using the pre-mrna human genome (generated as specified here in section Generating a Cell Ranger compatible "pre-mRNA" Reference Package
at the bottom of the page).
I ran velocyto on my samples with velocyto run /path/to/cellranger/possorted.bam /path/to/genome/genes.gtf
and I see that after reading in the .loom
file and doing scv.pp.show_proportions(adata_loom)
I get:
Abundance of ['spliced', 'unspliced']: [1. 0.]
And further analysis breaks/loses the point.
Considering the conversation above, I assume I should be seeing other proportions of slpiced to unspliced reads. Am I doing anything wrong? I would appreciate your comment.
Thanks, Anna
Hi Anna,
Not one of the scvelo authors but I have run into the same situation. The pre-mrna reference used for 10X is basically telling cell ranger to consider introns as exons, so that introns will be counted towards gene expression quantification. In newer versions of cell ranger you actually just specify --include-introns
rather than using a separate reference. However, when you apply that pre-mrna .gtf to velocyto, it won't be able to identify any introns and thus everything will be labeled as spliced.
You can use the pre-mrna reference for cell ranger, but for velocyto you actually need to use the normal mRNA .gtf in order to be able to tell the difference between spliced and unspliced reads.
I hope this helps! Sam
Thank you very much Sam, I will try that!
Anna
Hi all, I have two basic questions if I can apply scvelo on my data set. We have generated single cell nuclei data (not cell) - extracting the cell nuclei without the cytoplasm. So in principle we should have the spliced/ unspliced mRNA. Would you see there problems in applying scvelo?
Another question is that we purified the poly-adenylated mRNA with the standard 10X protocol. I assume that the 10X experimental protocol is good to use with scvelo?
Best wishes, Simon