metagentools / GraphBin2

☯️🧬 Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs
https://graphbin2.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
33 stars 3 forks source link

KeyError during "Propagating labels to unlabelled vertices" #4

Open nick-youngblut opened 3 years ago

nick-youngblut commented 3 years ago

The error:

GraphBin2 started
-------------------
Total number of contigs available: 276680
Total number of edges in the assembly graph: 23569
Number of bins available in binning result: 13
Number of binned contigs: 2261
Total number of unbinned contigs: 274419
Number of isolated contigs: 270459

Removing labels of unsupported vertices...
Iteration: 1
100%|███████████████████████████████████████████████████████████| 2261/2261 [00:03<00:00, 669.23it/s]
Iteration: 2
100%|███████████████████████████████████████████████████████████| 2178/2178 [00:02<00:00, 731.72it/s]
Iteration: 3
100%|███████████████████████████████████████████████████████████| 2177/2177 [00:02<00:00, 734.18it/s]
Iteration: 4
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 734.44it/s]

Refining labels of inconsistent vertices...
Iteration: 1
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 733.30it/s]
Iteration: 2
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 770.52it/s]
Iteration: 3
100%|███████████████████████████████████████████████████████████| 2176/2176 [00:02<00:00, 771.00it/s]

Obtaining non isolated contigs...
100%|██████████████████████████████████████████████████████| 276680/276680 [00:29<00:00, 9521.30it/s]

Number of non-isolated contigs: 5095
Number of non-isolated unbinned contigs: 2919

Propagating labels to unlabelled vertices...
  0%|                                                                       | 0/2919 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 617, in <module>
    sorted_node_list_ = [list(runBFS(x, threhold=depth)) for x in contigs_to_bin]
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 617, in <listcomp>
    sorted_node_list_ = [list(runBFS(x, threhold=depth)) for x in contigs_to_bin]
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llmga/bin/scripts/GraphBin2/src/graphbin2_SPAdes.py", line 350, in runBFS
    labelled_nodes.add((node, active_node, contig_bin, depth[active_node], abs(coverages[contigs_map[node]]-coverages[contigs_map[active_node]])))
KeyError: 276488
  0%|

What is the key error referring to? What is the key that is not found?

conda info:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
biopython                 1.78             py39hbd71b63_1    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
cairo                     1.16.0            h488836b_1006    conda-forge
certifi                   2020.12.5        py39hf3d152e_0    conda-forge
fontconfig                2.13.1            h1056068_1002    conda-forge
freetype                  2.10.4               h5ab3b9f_0
gettext                   0.19.8.1             h9b4dc7a_1
gmp                       6.2.1                h58526e2_0    conda-forge
icu                       67.1                 he1b5a44_0    conda-forge
ld_impl_linux-64          2.35.1               hed1e6ac_0    conda-forge
libblas                   3.9.0                3_openblas    conda-forge
libcblas                  3.9.0                3_openblas    conda-forge
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            9.3.0               he4bcb1c_17    conda-forge
libgfortran5              9.3.0               he4bcb1c_17    conda-forge
libglib                   2.66.3               h1f3bc88_1    conda-forge
libgomp                   9.3.0               h5dbcf3e_17    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0                3_openblas    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.3.0               h2ae2ef3_17    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.10               h68273f3_2    conda-forge
ncurses                   6.2                  he6710b0_1
numpy                     1.19.4           py39h57d35e7_1    conda-forge
openssl                   1.1.1h               h7b6447c_0
pcre                      8.44                 he6710b0_0
pip                       20.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.38.0               h7b6447c_0
pycairo                   1.20.0           py39h08627d8_1    conda-forge
python                    3.9.0                hdb3f193_2
python-igraph             0.8.3            py39hd24af65_2    conda-forge
python_abi                3.9                      1_cp39    conda-forge
readline                  8.0                  h7b6447c_0
setuptools                50.3.2           py39h06a4308_2
sqlite                    3.34.0               h74cdb3f_0    conda-forge
texttable                 1.6.3              pyh9f0ad1d_0    conda-forge
tk                        8.6.10               hbc83047_0
tqdm                      4.54.1             pyhd8ed1ab_0    conda-forge
tzdata                    2020d                h52ac0ba_0
wheel                     0.36.1             pyhd3deb0d_0    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
nick-youngblut commented 3 years ago

I think that the error is due to me using a spades assembly contig fasta in which all sequences <2000bp were removed. I'm guessing that graphbin2 expects all contigs in the .gfa and .paths files to be present in the fasta file also. It would help to just have a warning instead of a keyerror, given that many users filtering the contig fasta generated by metaspades, since metaspades has no minimum contig length

Vini2 commented 3 years ago

Hi @nick-youngblut,

You are correct. GraphBin2 expects all the contigs available in the *.paths to be provided for binning. I will add a fix so users can filter out contigs and still use the original graph. Thank you for pointing this out. I will leave this issue open until I fix it.