theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
410 stars 102 forks source link

scv.tl.velocity_graph() -> Corrupted neighbor graph #256

Closed mvinyard closed 3 years ago

mvinyard commented 4 years ago

Error when running scv.tl.velocity_graph() that prevents me from moving forward. Tried the suggestions in the error message.

scv.tl.velocity_graph(ldata)
Error ``` WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`. . . . ValueError: Your neighbor graph seems to be corrupted. Consider recomputing via pp.neighbors. ```

Versions:

scvelo==0.2.1 scanpy==1.5.1 anndata==0.7.4 loompy==3.0.6 numpy==1.18.5 scipy==1.5.0 matplotlib==3.2.2 sklearn==0.23.1 pandas==1.0.5 WARNING: There is a newer scvelo version available on PyPI: Your version: 0.2.1 Latest version: 0.2.2 > Full context: Screen Shot 2020-07-26 at 5 24 03 PM
VolkerBergen commented 3 years ago

Try with the latest version. It should be possible now to use an adjusted neighbor graph. Otherwise, let me know and I'll have a closer look.

coolmak32 commented 3 years ago

Hi mvinyard, Were you able to solve the issue? I am encountering the same error after I give pp.moments command.

Alexei-Lipov commented 3 years ago

Hi, I am also unable to get around this issue. I have tried with the latest version of scVelo with no success.

WeilerP commented 3 years ago

@mvinyard, any update on this? @coolmak32, @Alexei-Lipov could you please provide a code snippet used which caused the problem?

WeilerP commented 3 years ago

Closing this for now. Happy to reopen.

cadyyuheng commented 3 years ago

Hello @WeilerP, I encountered the same issue with scvelo 0.2.3 when following the RNA Velocity Basic tutorial. I couldn't reproduce the "Estimate RNA velocity" step in the tutorial and here is the error:

scv.tl.velocity_graph(adata)`
WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) 
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-699687a77a69> in <module>
----> 1 scv.tl.velocity_graph(adata)

~\Anaconda3\lib\site-packages\scvelo\tools\velocity_graph.py in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, compute_uncertainties, approx, mode_neighbors, copy)
    306         compute_uncertainties=compute_uncertainties,
    307         report=True,
--> 308         mode_neighbors=mode_neighbors,
    309     )
    310 

~\Anaconda3\lib\site-packages\scvelo\tools\velocity_graph.py in __init__(self, adata, vkey, xkey, tkey, basis, n_neighbors, sqrt_transform, n_recurse_neighbors, random_neighbors_at_max, gene_subset, approx, report, compute_uncertainties, mode_neighbors)
    101         if np.min((get_neighs(adata, "distances") > 0).sum(1).A1) == 0:
    102             raise ValueError(
--> 103                 "Your neighbor graph seems to be corrupted. "
    104                 "Consider recomputing via pp.neighbors."
    105             )

ValueError: Your neighbor graph seems to be corrupted. Consider recomputing via pp.neighbors.
Session info: Software Version
Python 3.7.3 64bit [MSC v.1915 64 bit (AMD64)]
IPython 7.4.0
OS Windows 10 10.0.19041 SP0
scvelo 0.2.3
scanpy 1.8.1
anndata 0.7.6
loompy 3.0.6
numpy 1.21.1
scipy 1.7.0
matplotlib 3.4.2
sklearn 0.21.1
pandas 1.3.0

Any suggestion on where might be the issue?

WeilerP commented 3 years ago

@cadyyuheng, are you using the pancreas data for this or your own? Also which versions of umap-learn and numba are you using? I just ran the tutorial in a clean conda environment and did not encounter the problem. Not sure if this is relevant: There have been some issue caused when using pandas==1.3.0.

cadyyuheng commented 3 years ago

I was using the pancreas data. Yes, it's probably because of pandas -- Although I don't quite sure what's the exact incompatible parts, after I downgraded my numpy and upgraded my sklearn, I'm now able to run the tutorial on both pancreas data and my own data. Thanks, @WeilerP! And just for the record, here's my current session info:

Software Version
Python 3.7.3 64bit [MSC v.1915 64 bit (AMD64)]
IPython 7.4.0
OS Windows 10 10.0.19041 SP0
scvelo 0.2.3
scanpy 1.8.1
anndata 0.7.6
loompy 3.0.6
numpy 1.19.1
scipy 1.7.0
matplotlib 3.4.2
sklearn 0.24.2
pandas 1.3.0
umap-learn 0.4.6
numba 0.51.2
WeilerP commented 3 years ago

Hm, that's strange. Everything worked for me using the latest package versions. Either way, glad it's working now for you @cadyyuheng.

christoffermattssonlangseth commented 3 years ago

I'm currently running into this issue. Has there been any consensus on where it stems from? I have installed a fresh conda environment according to @cadyyuheng versions but still doesn't work. Also, I have three separate anndata objects and it works for one of them, it still says that "WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) or is corrupted (e.g. due to subsetting). Consider recomputing with pp.neighbors." but moves on to computing the velocity graph anyways.

WeilerP commented 3 years ago

@christoffermattssonlangseth, did you subset your AnnData? This issue previously occured, for example, when after filtering some observations appeared to be duplicates, i.e. related to subsetting as mentioned in the warning. Might this be the issue for you?

christoffermattssonlangseth commented 3 years ago

@WeilerP, no I have not subsetted the Anndata objects. Is there anything in particular I should look for in the neighborhood graphs?

WeilerP commented 3 years ago

@christoffermattssonlangseth, could you please provide a code snippet of the pipeline you are running and the output it produces?

christoffermattssonlangseth commented 3 years ago

Absolutely! What I'm basically doing is to input two Anndata objects, one from single cell RNA sequencing and one from in situ sequencing. Using Spatial inference of RNA velocity (SIRV), I integrate both datasets. The specific code can be found here. This output an integrated Anndata object (maybe something becomes corrupted here).

RNA = scv.read('SIRV_data/RNA_adata.h5ad') HybISS = scv.read('SIRV_data/HybISS_adata.h5ad') HybISS_imputed = SIRV(HybISS,RNA,50,['Tissue','Region','Class','Subclass'])

Then I normalize the imputed un/spliced expression and undo the double normalization of the full mRNA 'X' (potential point of corruption as well). scv.pp.normalize_per_cell(HybISS_imputed, enforce=True) HybISS_imputed.X = HybISS.to_df()[HybISS_imputed.var_names]

Then: sc.pp.scale(HybISS_imputed) sc.tl.pca(HybISS_imputed) sc.pl.pca_variance_ratio(HybISS_imputed, n_pcs=50, log=True) sc.pp.neighbors(HybISS_imputed, n_neighbors=30, n_pcs=30) sc.tl.umap(HybISS_imputed) sc.tl.leiden(HybISS_imputed)

Then scVelo: scv.pp.moments(HybISS_imputed, n_pcs=30, n_neighbors=30) scv.tl.velocity(HybISS_imputed) scv.pp.neighbors(HybISS_imputed) scv.tl.velocity_graph(HybISS_imputed)

Output:

WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) 
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:00) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing neighbors
    finished (0:00:03) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) 
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-50-b83616e54602> in <module>
      2 scv.tl.velocity(s2_imputed)
      3 scv.pp.neighbors(s2_imputed)
----> 4 scv.tl.velocity_graph(s2_imputed)

~/.local/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, compute_uncertainties, approx, mode_neighbors, copy)
    292         sqrt_transform = variance_stabilization
    293 
--> 294     vgraph = VelocityGraph(
    295         adata,
    296         vkey=vkey,

~/.local/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py in __init__(self, adata, vkey, xkey, tkey, basis, n_neighbors, sqrt_transform, n_recurse_neighbors, random_neighbors_at_max, gene_subset, approx, report, compute_uncertainties, mode_neighbors)
    100             neighbors(adata)
    101         if np.min((get_neighs(adata, "distances") > 0).sum(1).A1) == 0:
--> 102             raise ValueError(
    103                 "Your neighbor graph seems to be corrupted. "
    104                 "Consider recomputing via pp.neighbors."

ValueError: Your neighbor graph seems to be corrupted. Consider recomputing via pp.neighbors.
WeilerP commented 3 years ago

@christoffermattssonlangseth seems like a different setup (data type etc.) compared to the original content of the issue. Please open a new issue template and providing all the info asked for in the issue template. It'll make it easier for others to find a solution when running into the same issue when working with this kind of data. Thanks!