scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
212 stars 34 forks source link

ValueError in ir.tl.clonotype_network #201

Closed stefanpeidli closed 3 years ago

stefanpeidli commented 3 years ago

Hi Gregor,

I stumbled upon some ValueError when executing ir.tl.clonotype_network.

Essentially I do

adata_tcr = ir.io.read_10x_vdj(data_path + '/filtered_contig_annotations.csv')
ir.pp.merge_with_tcr(adata, adata_tcr)
ir.pp.tcr_neighbors(adata)
ir.tl.define_clonotypes(adata)
ir.tl.clonotype_network(adata)

Where the last function call outputs an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-218-962b1ed2cf30> in <module>
----> 1 ir.tl.clonotype_network(adata)

~\Miniconda3\envs\py36\lib\site-packages\scirpy\_tools\_clonotypes.py in clonotype_network(adata, sequence, metric, min_size, layout, layout_kwargs, neighbors_key, key_clonotype_size, key_added, inplace, random_state)
    288         raise ValueError(
    289             "Clonotype size information not found. Did you run `tl.define_clonotypes`?"
--> 290         )
    291 
    292     if not adata.n_obs == conn.shape[0] == conn.shape[0]:

~\Miniconda3\envs\py36\lib\site-packages\scirpy\util\graph.py in layout_components(graph, component_layout, arrange_boxes, pad_x, pad_y)
     95     ]
     96     # get vertexes back into their original order
---> 97     coords = np.vstack(component_layouts)[vertex_sorter, :]
     98     return coords
     99 

<__array_function__ internals> in vstack(*args, **kwargs)

~\Miniconda3\envs\py36\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    280     if not isinstance(arrs, list):
    281         arrs = [arrs]
--> 282     return _nx.concatenate(arrs, 0)
    283 
    284 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

I figured that maybe there is something wrong with the contents of adata.obs['clonotype']. I checked and see 12179 different clonotypes for 12737 cells in there. Could that be the reason for the error?

Thanks in advance for you help!

Best Stefan

grst commented 3 years ago

Hi Stefan,

thanks for reporting! I believe this could occur if the clonotype network has no nodes. Did you use a min_size parameter in clonotype_network?

To investigate further, could you share the data with me, or at least the clonotype and clonotype_size columns of adata.obs and adata.uns["tcr_neighbors_clonotype_identity"]?

You can also send it to mail [AT] gregor-sturm [DOT] de, if you can't share it publicly.

I checked and see 12179 different clonotypes for 12737 cells in there. Could that be the reason for the error?

these seem pretty normal values when you don't observe a lot of expansion.

Best, Gregor

stefanpeidli commented 3 years ago

Thanks for your quick reply.

I did not use the min_size parameter in clonotype_network. I tried it again with min_size=2, but that did not fix the error. In the meantime I have updated from 0.4.2 to indev version (0.4.3-something), to check if that fixes the issue (it didn't).

This is what the data looks like:

adata.uns['ir_neighbors_nt_identity']:

{'params': {'metric': 'identity',
  'cutoff': 0,
  'dual_ir': 'primary_only',
  'receptor_arms': 'all'},
 'connectivities': <12737x12737 sparse matrix of type '<class 'numpy.int32'>'
    with 15683 stored elements in Compressed Sparse Row format>,
 'distances': <12737x12737 sparse matrix of type '<class 'numpy.int32'>'
    with 15683 stored elements in Compressed Sparse Row format>}

adata.obs[['clonotype','clonotype_size']]:


clonotype | clonotype_size
-- | --
0_TCR | 1
1_no IR | 1
2_TCR | 2
3_TCR | 2
4_TCR | 1
5_TCR | 1
6_TCR | 2
7_TCR | 1
8_TCR | 1
9_TCR | 1
10_TCR | 1
11_TCR | 1
12_TCR | 1
13_TCR | 1
14_TCR | 1
15_TCR | 1
16_TCR | 1
17_TCR | 1
18_TCR | 1
19_TCR | 1
20_TCR | 1
21_TCR | 1
22_TCR | 1
23_TCR | 1
24_TCR | 1
25_TCR | 1
26_TCR | 1
27_TCR | 1
28_TCR | 1
29_TCR | 1
... | ...
12152_TCR | 1
12153_TCR | 1
12154_TCR | 1
732_TCR | 9
12155_TCR | 1
12156_TCR | 1
12157_TCR | 1
12158_TCR | 1
12159_TCR | 1
12160_TCR | 1
12161_no IR | 1
12162_TCR | 1
12163_TCR | 1
12164_TCR | 1
12165_TCR | 1
12166_TCR | 1
12167_TCR | 1
12168_TCR | 1
12169_TCR | 1
4593_TCR | 4
12170_TCR | 1
12171_TCR | 1
12172_TCR | 1
12173_TCR | 1
12174_TCR | 1
12175_TCR | 1
12176_TCR | 1
12177_TCR | 1
12178_TCR | 1
12179_no IR | 1
grst commented 3 years ago

Ok, I've got another suspicion:

Could you try to subset to cells that have a TCR after running merge_with_tcr? i.e. adata = adata[adata.obs["has_tcr"] == "True", :] (or "has_ir" in the latest dev version).

stefanpeidli commented 3 years ago

Unfortunately, that did not fix it.

The problem seems to be in the util function layout_components gets a graph that has no components. When checking components = np.array(graph.decompose(mode="weak")), these are empty and consequently np.vstack(component_layouts)[vertex_sorter, :] returns an error, since vstack can not stack empty arrays.

The graph does not seem empty though, at least it has tons of edges. Still, components = np.array(graph.decompose(mode="weak")) returns [], which is an issue.

I don't really know what to do with this bug. Maybe my igraph version is wrong? My version is 0.7.1.

If it helps, here the output of print(graph) before graph.decompose is called:

graph: IGRAPH U-W- 12129 1777 --
+ attr: weight (e)
+ edges:
1--1390 2--271 5--1622 46--427 46--485 46--579 46--3812 46--5612 46--7209
46--7465 46--7731 46--7956 46--8649 46--8820 46--10083 46--10803 46--10873
46--10989 49--4014 72--4827 72--6028 91--8440 111--158 111--1662 111--3581
111--8385 111--9374 131--597 131--680 131--780 131--1133 131--1805 131--2387
131--2896 131--3919 131--4761 131--5470 131--5741 131--6972 131--7431
131--8720 131--9680 131--9941 131--10271 131--10367 131--10543 131--10977
131--11656 131--12043 142--195 158--1662 158--3581 158--8385 158--9374
174--8111 178--7735 192--2080 192--9906 192--11955 220--4839 220--8828
222--3621 222--10242 225--235 225--418 225--821 225--1687 225--3083 225--3286
225--3442 225--4585 225--4930 225--6149 225--7370 225--7680 225--7849
225--9709 225--9750 225--10011 225--10771 225--11080 225--11140 225--11200
225--11527 225--11665 225--11918 235--418 235--821 235--1687 235--3083
235--3286 235--3442 235--4585 235--4930 235--6149 235--7370 235--7680
235--7849 235--9709 235--9750 235--10011 235--10771 235--11080 235--11140
...
11140--11527 11140--11665 11140--11918 11200--11527 11200--11665 11200--11918
11336--11795 11411--11994 11411--12104 11454--11704 11522--11983 11527--11665
11527--11918 11656--12043 11665--11918 11994--12104
grst commented 3 years ago

This is weird... I have igraph 0.8.2, but I doubt that this is the problem. Can you (privately) share your anndata object with me (as h5ad)? You can get rid of gene expression and also CDR3 sequences before sharing, I essentially just need the columns mentioned above.

stefanpeidli commented 3 years ago

I've emailed the corresponding data to you.

grst commented 3 years ago

Thanks a lot for the data!

I did literally

import scirpy as ir
import scanpy as sc

adata = sc.read_h5ad("/home/sturm/Downloads/export_for_gregor.h5")

adata.obs["receptor_type"] = "TCR"

ir.tl.define_clonotypes(adata)

ir.tl.clonotype_network(adata)

ir.pl.clonotype_network(adata)

and it worked. Do you still encounter the issue with the same commands? In that case it might really be an issue with one of the dependencies...

Note 1 I noted that all your clonotypes and clonotype sizes were nan, so I had to redefine them

Note 2 I used the latest development version of scirpy. 0.4.x should be excactely the same, but it doesn't use the receptor_type column yet. It should just work without that line.

stefanpeidli commented 3 years ago

Strange, the exact same code still fails for me with the same error.

Also, if I read the exported data, both clonotypes and sizes are fine and not nan. So I think this might be due to either an old scanpy and/or anndata version. I updated both of them.

Now it seems that I have a new pandas version as well, which results in some weird error due to adata.obs['clonotype_size'] being categorial. I converted it to int to fix this using adata.obs['clonotype_size']=adata.obs['clonotype_size'].astype('int'). Scanpy automatically saves anything you add to adata.obs as categorial prompting

... storing 'receptor_type' as categorical
... storing 'receptor_subtype' as categorical
... storing 'chain_pairing' as categorical
... storing 'clonotype' as categorical

After this, I still get the exact same error. I still think this is due to some version issue. Worst case this is a windows issue (wouldn't be the first time with scanpy). I will check this soonish on my linux cluster. Here my versions:

WARNING: If you miss a compact list, please try `print_header`!
-----
anndata     0.6.22.post2.dev82+g6310e3e
scanpy      1.6.0
sinfo       0.3.1
-----
Levenshtein                 NA
PIL                         6.1.0
airr                        1.3.0
anndata                     0.6.22.post2.dev82+g6310e3e
autopep8                    1.4.3
backcall                    0.1.0
cairo                       1.20.0
cffi                        1.12.2
cloudpickle                 0.6.1
colorama                    0.4.1
cycler                      0.10.0
cython_runtime              NA
dask                        1.0.0
dateutil                    2.8.0
decorator                   4.3.2
future_fstrings             NA
get_version                 2.1
google                      NA
h5py                        2.10.0
igraph                      0.7.1
importlib_metadata          0.22
ipykernel                   5.1.2
ipython_genutils            0.2.0
ipywidgets                  7.5.1
jedi                        0.13.2
joblib                      0.13.2
kiwisolver                  1.0.1
legacy_api_wrap             1.2
llvmlite                    0.27.0
louvain                     0.6.1
matplotlib                  3.2.1
more_itertools              NA
mpl_toolkits                NA
natsort                     6.0.0
nt                          NA
numba                       0.42.0
numexpr                     2.6.9
numpy                       1.17.2
packaging                   19.0
pandas                      1.1.3
parasail                    1.2
parso                       0.3.4
pickleshare                 0.7.5
pkg_resources               NA
prompt_toolkit              2.0.9
pycodestyle                 2.5.0
pycparser                   2.19
pygments                    2.4.2
pyparsing                   2.3.1
pytoml                      NA
pytz                        2018.9
scanpy                      1.6.0
scipy                       1.4.0
scirpy                      0.4.3.dev28+gb1c6c06
scvelo                      0.1.16.dev32+c00a55e.dirty
seaborn                     0.9.0
setuptools_scm              NA
sinfo                       0.3.1
six                         1.12.0
sklearn                     0.23.1
squarify                    NA
statsmodels                 0.10.1
storemagic                  NA
tables                      3.4.4
toolz                       0.9.0
tornado                     5.1.1
tqdm                        4.31.1
tracerlib                   NA
traitlets                   4.3.2
umap                        0.3.10
wcwidth                     NA
yaml                        3.13
yamlordereddictloader       NA
zipp                        NA
zmq                         18.0.0
-----
IPython             7.11.1
jupyter_client      5.3.1
jupyter_core        4.4.0
notebook            6.0.0
-----
Python 3.6.6 | packaged by conda-forge | (default, Jul 26 2018, 11:48:23) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.19041-SP0
8 logical CPU cores, Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
-----
Session information updated at 2020-10-12 13:42
grst commented 3 years ago

Really strange, let me know if you find out something! I also tried igraph 0.7.1 and it still works.

You could try with a clean conda environment to check if it could be a dependency issue:

conda create -n test_scirpy scirpy
stefanpeidli commented 3 years ago

It seems to work now :) Apparently the old scanpy/anndata version was the issue. My jupyter kernel just did not reload the packages accordingly on my local machine. Now it works also on windows.

Thanks for your help! Btw I really like this package. Keep up the great work!

grst commented 3 years ago

Perfect! Let me know if you have further issues, I want this package to work for everyone.