Closed shank117 closed 3 years ago
@shank117 please provide the output of scv.logging.print_versions()
as asked for in the issue template.
hi @shank117 , jumping in on this as I noticed you are trying to run scavelo on FFPE tissue. Seems like the proportion of spliced/unspliced is off. Out of curiosity, did you try to run this analysis on standard Visium datasets (non FFPE)? thank you!
@shank117, also some input on your pipeline:
adata.var_names_make_unique
as suggested by the output.min_shared_counts
.scv.pp.moments
.@WeilerP Here is the output: Running scvelo 0.2.3 (python 3.8.8) on 2021-08-17 11:25. Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process. ERROR: XMLRPC request failed [code: -32500] RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.
@giovp
hi @shank117 , jumping in on this as I noticed you are trying to run scavelo on FFPE tissue. Seems like the proportion of spliced/unspliced is off. Out of curiosity, did you try to run this analysis on standard Visium datasets (non FFPE)? thank you!
I have not used non FFPE datasets. I am sorry I am new to all of this could you tell me what the difference is between FFPE datasets and non FFPE datasets are? I am using Visium datasets for all my analysis and it has worked for nonspatial datasets like: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_v3?
@WeilerP
I have run this now:
scv.pp.filter_and_normalize(adata, min_shared_counts=5, n_top_genes=1000)
scv.pp.moments(adata, n_pcs=20, n_neighbors=20)
scv.pp.neighbors(adata, n_pcs=20, n_neighbors=20)
The rest is the same as above.
The output is:
Running scvelo 0.2.3 (python 3.8.8) on 2021-08-17 11:33.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Abundance of ['spliced', 'unspliced']: [0.33 0.67]
Filtered out 33513 genes that are detected 5 counts (shared).
Normalized count data: X, spliced, unspliced.
Skip filtering by dispersion since number of variables are less than `n_top_genes`.
Logarithmized X.
computing neighbors
finished (0:00:01) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo)
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.
computing moments based on connectivities
finished (0:00:00) --> added
'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing neighbors
finished (0:00:01) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing velocities
WARNING: Too few genes are selected as velocity genes. Consider setting a lower threshold for min_r2 or min_likelihood.
finished (0:00:00) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo)
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.
Traceback (most recent call last):
File "/home/gddaslab/rssxm007/yard/run_spaceranger_count/velocyto/spatialdata.py", line 29, in <module>
scv.tl.velocity_graph(adata)
File "/home/gddaslab/rssxm007/anaconda3/envs/velocyto/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py", line 294, in velocity_graph
vgraph = VelocityGraph(
File "/home/gddaslab/rssxm007/anaconda3/envs/velocyto/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py", line 102, in __init__
raise ValueError(
ValueError: Your neighbor graph seems to be corrupted.Consider recomputing via pp.neighbors.
Question: How would I check that observations are not duplicates after the filtering step? Again I am very new to rna analysis and scvelo. Thank you for you help
@WeilerP Here is the output: Running scvelo 0.2.3 (python 3.8.8) on 2021-08-17 11:25. Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process. ERROR: XMLRPC request failed [code: -32500] RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.
This doesn't seem to be the output of scv.logging.print_versions()
. It should print a number of package versions (e.g. scanpy
and anndata
).
@shank117 one quick remark: You should run the neighbor calculation prior to scv.pp.moments
. You can check the number of duplicate rows e.g. via
adata.to_df().duplicated().sum()
I'd first check that there are no duplicate rows both immediately after reading the data and then after filtering (the latter should be a rare event). In either way, you can drop the duplicate rows using
adata = adata[~adata.to_df().duplicated(), :]
You should also run
adata.var_names_make_unique()
prior to the analysis pipeline.
@WeilerP
Thank you for the advice. I did not find any duplicates and ran what you suggested but I am still getting the same error unfortunately with the same output. Any other suggestions?
@shank117
I have not used non FFPE datasets. I am sorry I am new to all of this could you tell me what the difference is between FFPE datasets and non FFPE datasets are? I am using Visium datasets for all my analysis and it has worked for nonspatial datasets like: support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_v3?
I am not entirely sure ( @WeilerP knows more) but if the issue is the quality of the data itself (and If I'm not mistaken the spliced/unspliced ratio looks off), I'd go with non-FFPE data. FFPE data are known to have lower quality because of the fixation protocol.
Thank you for the advice. I did not find any duplicates and ran what you suggested but I am still getting the same error unfortunately with the same output. Any other suggestions?
@shank117, did you check for duplicate rows both after reading the data as well as after calling scv.pp.filter_and_normalize
? If so, can you share the AnnData object you are running your pipeline on? I'd have to take a closer look at it when I find the time.
@WeilerP
Here is the loom file: Visium_FFPE_Human_Breast_Cancer_possorted_genome_bam_5BKOT.loom.zip
@shank117, as suspected, your data does contain duplicate rows (70 after reading, 575 after filtering). Removing them solves the problem.
This is also a duplicate of #637.
@WeilerP sorry but out of curiosity how does this issue arise? Is it velocyto or something happening with cellrange/spaceranger at mapping steps?
@WeilerP sorry but out of curiosity how does this issue arise? Is it velocyto or something happening with cellrange/spaceranger at mapping steps?
Not sure why this occurs in the initial data TBH. I would guess it (same measurements in two different cells) is either extremely rare or observations are actual duplicates (did not check if names were duplicates, but anndata would point it out I believe). The duplicate rows after filtering occur by chance when we have genes with few counts and the observations becoming duplicates differ only in those genes.
The duplicate rows after filtering occur by chance when we have genes with few counts and the observations becoming duplicates differ only in those genes.
right ok so it's just matter of chance? do you think it is more likely in bad-quality data (where e.g. you'd have much sparser matrxi and so easier to get duplicated observations? )
right ok so it's just matter of chance? do you think it is more likely in bad-quality data (where e.g. you'd have much sparser matrxi and so easier to get duplicated observations? )
Yes, I would assume that this phenomenon comes up more often if data quality is low.
Hi I have downloaded the BAM file from the spaceranger pipeline here: https://support.10xgenomics.com/spatial-gene-expression/datasets/1.3.0/Visium_FFPE_Human_Breast_Cancer? (Human breast cancer 1.3 genome aligned bam). Then I used Velocyto to create the .loom file. Then I followed the example workflow from scvelo from the endocrine. When I tried to run my dataset into scvelo I received a corrupted neighborhood graph error. I have checked that my .loom file is not corrupted because I can still access portions of it but the software cannot do anything with it.
Python:
Output:
Versions:
scvelo 0.2.3 python 3.8.8