Closed hyjforesight closed 2 years ago
@hyjforesight, I presume adata.X
and adata.layers['spliced']
are not the same array when you load the 10x matrix first and then the LOOM file? scVelo expects the same entry in adata.X
and adata.layers['spliced']
. If the two are different this flag will be False
.
Hello @WeilerP Thanks for the response. The 10X matrix and loom were generated from the same FASTQ files. I did CellRanger to generate 10X matrix and used velocyto.py to generate loom file from the same CellRanger outputs. Their cell numbers and gene numbers are the same (as the code shown).
ACTWT = sc.read_10x_mtx(path='C:/Users/Park_Lab/Documents/filtered_feature_bc_matrix/', var_names='gene_symbols') # load 10X matrix
ACTWT.var_names_make_unique()
ACTWT
AnnData object with n_obs × n_vars = 18228 × 32285
var: 'gene_ids', 'feature_types'
loom = scv.read(filename='C:/Users/Park_Lab/Documents/Tumor.loom')
loom
AnnData object with n_obs × n_vars = 18228 × 32285
obs: 'Clusters', '_X', '_Y'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'
Is it because there're some obs (obs: 'Clusters', '_X', '_Y') variables in loom file?
Thanks! Best, YJ
@hyjforesight, unless I misunderstand, adata.X
and adata.layers['spliced']
are not the same since one is the CellRanger output (i.e., unaligned reads) and the other comes from the velocyto pipeline (i.e., a fraction of the former) You can double check this by running
(adata.X != adata.layers['spliced']).nnz == 0
If they are equal this would return True
.
Hello @WeilerP
Thanks for your explanation. Now I have a better understanding of the loom file. I ran (adata.X != adata.layers['spliced']).nnz == 0
and it was False
.
In this case, is there any way to make this pasted adata
work for the log-transformation? Because we want to do scVelo on a subset of the 10X matrix.
I loaded the raw loom file and tried to do a view, expecting to use this view for scVelo, but it failed because some prefixes and suffixes were automatically added onto the index. Loom file cannot be sliced by the index of h5ad. Plese see the codes.
loom = sc.read_loom(filename='C:/Users/Park_Lab/Documents/Tumor.loom')
loom.obs['Clusters']
CellID
Tumor:AAAGAACCAATAGTGAx 13
Tumor:AAACGCTCAGCAGTTTx 10
Tumor:AAACGAACACAAGCAGx 2
Tumor:AAAGGATCATGGCACCx 5
ACTWT = sc.read('C:/Users/Park_Lab/Documents/ACTWT.h5ad')
loom.obs['batch']= ACTWT.obs['batch']
adata=loom[loom.obs['batch'].isin(['ACTWT'])]
adata
View of AnnData object with n_obs × n_vars = 0 × 32285
ACTWT.obs['leiden']
AAACCCAAGAAAGTCT-1-ACTWT 6
AAACCCAAGATACTGA-1-ACTWT 3
How about running the scv.pp.filter_and_normalize()
step by step like this? Does it do log-transformation but I didn't see any notification like Logarithmized X
? Is it possible to enforce the log-transformation?
ACTWT = sc.read_10x_mtx(path='C:/Users/Park_Lab/Documents/Tumor/filtered_feature_bc_matrix/', var_names='gene_symbols') # load 10X matrix
ACTWT.var_names_make_unique()
ACTWT
ldata = scv.read(filename='C:/Users/Park_Lab/Documents/Tumor.loom')
adata = scv.utils.merge(ACTWT, ldata) # paste unspliced and spliced information into it
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)
Filtered out 22229 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Thanks! Best, YJ
Yes, you can run the steps individually. You don't get a messagt because the logging is happening in the filter_and_normalize
function. Alternatively, you could also run scv.pp.filter_and_normalize
and then call scv.pp.log1p
afterwards.
@WeilerP Hello, Plz allow me ask a question: why do we need to extract the count matrix (umi count) from the cellranger output folder and combine the spliced/unspliced information of loom files. In my understanding, the loom file generated by the velocyto output provides all the information needed for scvelo. Is the master matrix X of h5ad produced by velocyto the original umi count? I believe that the raw UMI count matrix filtered_feature_bc_matrix produced by Cell Ranger counts should be identical to the matrix layer in the loom output by velocyto or the .X of the h5ad file. A side note: The master matrix X of anndata that scvelo finally outputs seems to be the count matrix (normalized and logarithmic), while the RNA velocity related information is kept in layers
If my conceptual understanding is incorrect, please point it out for me. Thanks! Best, Weir
import scanpy as sc
import scvelo as scv
adata = scv.read("/cluster/home/velocyto/result.h5ad")
adata = adata.transpose()
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)
scv.tl.velocity(adata)
scv.tl.velocity_graph(adata,n_jobs=64)
In my processing pipeline, velocyto acts as an upstream tool for scvelo, and the resulting h5ad file is utilized by scvelo to calculate the RNA velocity of each gene for each cell.I don't know if it's reasonable to do that
Hello scVelo and @WeilerP, When I load the 10X raw matrix and paste unspliced and spliced information into it,
scv.pp.filter_and_normalize()
cannot log-transform theX
. Please see the below codes. Appreciate it if you could help me with this issue. Thanks! Best, YJIf paste unspliced and spliced information into 10X matrix, cannot log-transform the
X
.This BUG is not caused by loom file itself, because if I load loom file, everything works fine.
Error output
```pytb WARNING: Did not modify X as it looks preprocessed already. ```Versions
```pytb scanpy==1.8.2 anndata==0.7.8 umap==0.5.2 numpy==1.21.5 scipy==1.7.3 pandas==1.3.5 scikit-learn==1.0.2 statsmodels==0.13.1 python-igraph==0.9.9 pynndescent==0.5.5 scvelo==0.2.4 scanpy==1.8.2 anndata==0.7.8 loompy==3.0.6 numpy==1.21.5 scipy==1.7.3 matplotlib==3.5.1 sklearn==1.0.2 pandas==1.3.5 cellrank==1.5.1 scanpy==1.8.2 anndata==0.7.8 numpy==1.21.5 numba==0.55.0 scipy==1.7.3 pandas==1.3.5 pygpcca==1.0.3 scikit-learn==1.0.2 statsmodels==0.13.1 python-igraph==0.9.9 scvelo==0.2.4 pygam==0.8.0 matplotlib==3.5.1 seaborn==0.11.2 ```