rhysnewell / rosella

Metagenomic Binning Algorithm
BSD 3-Clause "New" or "Revised" License
38 stars 3 forks source link

Rosella recover flight status error #30

Closed Rridley7 closed 1 year ago

Rridley7 commented 1 year ago

Hi, I am running into an error when running rosella recover with several metagenomes. The error is not consistent between samples, e.g. I cannot predict when the error will occur, however it does happen consistently on the samples with which it occurs. The error statement is:

Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256)) thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14

The original command was: rosella recover -i S04_1a9817_spa_t_mtb_cov.txt -r S04_1a9817_spa_t_contigs.fa

I can provide the original files if needed, the coverage file was generated by use of coverm contig in metabat mode.

rhysnewell commented 1 year ago

Hmm, yeah that error is not very informative. Would you please provide the reference file and coverage file? I'll see if I can get to the bottom of it

Rhys

Rridley7 commented 1 year ago

Files are attached, thanks! S04_1a9817_spa_t_mtb_cov.txt S04_1a9817_spa_t_contigs.fa.zip

rhysnewell commented 1 year ago

Hi!

So i've looked through your files and tried running Rosella on them. You are right that rosella does error out, but I believe it is not due to a problem on rosella's end.

The assembly you are trying to bin is not very good, here are the stats from bbmap for it:

A   C   G   T   N   IUPAC   Other   GC  GC_stdev
0.2519  0.2479  0.2451  0.2551  0.0000  0.0000  0.0000  0.4930  0.0964

Main genome scaffold total:             1828
Main genome contig total:               1828
Main genome scaffold sequence total:    2.745 MB
Main genome contig sequence total:      2.745 MB    0.000% gap
Main genome scaffold N/L50:             663/1.435 KB
Main genome contig N/L50:               663/1.435 KB
Main genome scaffold N/L90:             1562/1.067 KB
Main genome contig N/L90:               1562/1.067 KB
Max scaffold length:                    15.637 KB
Max contig length:                      15.637 KB
Number of scaffolds > 50 KB:            0
% main genome in scaffolds > 50 KB:     0.00%

Minimum     Number          Number          Total           Total           Scaffold
Scaffold    of              of              Scaffold        Contig          Contig
Length      Scaffolds       Contigs         Length          Length          Coverage
--------    --------------  --------------  --------------  --------------  --------
    All              1,828           1,828       2,745,043       2,745,043   100.00%
    500              1,828           1,828       2,745,043       2,745,043   100.00%
   1 KB              1,828           1,828       2,745,043       2,745,043   100.00%
 2.5 KB                 99              99         375,466         375,466   100.00%
   5 KB                 15              15         115,390         115,390   100.00%
  10 KB                  3               3          38,314          38,314   100.00%

As you can see, most of the contigs fall below the default minimum contig size that rosella uses (--min-contig-size 1500). The size of the assembly of contigs > 1Kbp is less than 500Kbp. That's not really a whole lot of information for rosella, or any binning algorithm, to work with. I doubt you will easily get anything informative out of this assembly without some level of manual inspection.

I think I will go ahead and close this issue now. Hopefully you have found my response helpful, and you can find something useful in your assembly.

Cheers, Rhys

janfelix commented 1 year ago

Hello Rhys, I had the same issue and most likely due to the same problem with short contigs. My contigs are assembled from metatranscriptome data, so that's what they are. I had the impression that GroopM was able to process contigs as short as 500bp and then moved on to rosella. Do you see any chance rosella could work with contigs shorter than 1500 bp? Even just to try it out or by only using read coverage...

Thanks again for building rosella and the great support!

rhysnewell commented 1 year ago

You can certainly try it out, you just have to set --min-contig-size to the desired value and see how you go. If it returns and error again, then let me know. Thanks for trying it out :)

You'll probably also want to alter --min-bin-size as well and drop it down to a much lower value if you expect your metaT bins to small

Rridley7 commented 1 year ago

This was certainly helpful, thanks!

janfelix commented 1 year ago

Hi, I have tried a few things, contig size and bin size were lowered. Unfortunately, after successfully completing the "Contigs kmers analyzed" part it crashes:

[00:07:56] ⠋ Calculating UMAP embeddings and clustering... 3/6
[2022-11-24T22:17:10Z ERROR bird_tool_utils::command] Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256)) thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14

Not sure what that could mean...

rhysnewell commented 1 year ago

Would you please be able to post the output of conda list for your rosella conda environment?

janfelix commented 1 year ago

Hi, thanks for looking into this!

packages in environment at /home/jan/.conda/envs/rosella:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge asttokens 2.1.0 pyhd8ed1ab_0 conda-forge attrs 22.1.0 pyh71513ae_1 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 pyhd8ed1ab_3 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge biopython 1.80 pypi_0 pypi brotli 1.0.9 h166bdaf_8 conda-forge brotli-bin 1.0.9 h166bdaf_8 conda-forge brotlipy 0.7.0 py39hb9d737c_1005 conda-forge bwa 0.7.17 h7132678_9 bioconda bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge cachecontrol 0.12.12 pyhd8ed1ab_1 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge certifi 2022.9.24 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py39h74dc2b5_0
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.0.6 py39hf939315_0 conda-forge cryptography 38.0.3 py39hd97740a_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cython 0.29.32 py39h5a03fae_1 conda-forge dbus 1.13.6 he372182_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge exceptiongroup 1.0.4 pyhd8ed1ab_0 conda-forge executing 1.2.0 pyhd8ed1ab_0 conda-forge expat 2.5.0 h27087fc_0 conda-forge filelock 3.8.0 pyhd8ed1ab_0 conda-forge flight-genome 1.5.0 pypi_0 pypi fontconfig 2.14.1 hc2a2eb6_0 conda-forge fonttools 4.38.0 py39hb9d737c_1 conda-forge freetype 2.12.1 hca18f0e_0 conda-forge glib 2.69.1 h4ff587b_1
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 h28cd5cc_2
h5py 3.7.0 nompi_py39h817c9c5_102 conda-forge hdbscan 0.8.29 pypi_0 pypi hdf5 1.12.2 nompi_h2386368_100 conda-forge hdmedians 0.14.2 py39h2ae25f5_3 conda-forge htslib 1.16 h6bc39ce_0 bioconda icu 58.2 hf484d3e_1000 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge imageio 2.22.4 pypi_0 pypi iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge ipython 8.4.0 py39hf3d152e_0 conda-forge jedi 0.18.2 pyhd8ed1ab_0 conda-forge joblib 1.1.1 pypi_0 pypi jpeg 9e h166bdaf_2 conda-forge k8 0.2.5 hd03093a_2 bioconda keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 py39hf939315_1 conda-forge krb5 1.19.3 h3790be6_0 conda-forge lcms2 2.14 h6ed2654_0 conda-forge ld_impl_linux-64 2.38 h1181459_1
lerc 4.0.0 h27087fc_0 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_8 conda-forge libbrotlidec 1.0.9 h166bdaf_8 conda-forge libbrotlienc 1.0.9 h166bdaf_8 conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcurl 7.86.0 h7bff187_1 conda-forge libdeflate 1.13 h166bdaf_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.3 he6710b0_2
libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libnghttp2 1.47.0 hdcd2b5c_1 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.39 h753d276_0 conda-forge libssh2 1.10.0 haa6b8db_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libtiff 4.4.0 h0e0dad5_3 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.4 h166bdaf_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxml2 2.9.14 h74e7548_0
libzlib 1.2.13 h166bdaf_4 conda-forge llvm-openmp 15.0.5 he0ac6c6_0 conda-forge llvmlite 0.39.1 py39h7d9a04d_1 conda-forge lockfile 0.12.2 py_1 conda-forge matplotlib 3.6.2 py39hf3d152e_0 conda-forge matplotlib-base 3.6.2 py39hf9fd14e_0 conda-forge matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge minimap2 2.24 h7132678_1 bioconda msgpack-python 1.0.4 py39hf939315_1 conda-forge munkres 1.0.7 py_1 bioconda natsort 8.2.0 pyhd8ed1ab_0 conda-forge ncurses 6.3 h5eee18b_3
numba 0.56.4 py39h61ddf18_0 conda-forge numpy 1.21.0 pypi_0 pypi openjpeg 2.5.0 h7d73246_1 conda-forge openssl 1.1.1s h166bdaf_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.5.2 py39h4661b88_0 conda-forge parallel 20170422 pl5.22.0_0 bioconda parso 0.8.3 pyhd8ed1ab_0 conda-forge patsy 0.5.3 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pebble 5.0.3 pypi_0 pypi perl 5.22.0.1 0 conda-forge pexpect 4.8.0 pyh1a96a4e_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 9.2.0 py39hf3a2cdf_3 conda-forge pip 22.2.2 py39h06a4308_0
pluggy 1.0.0 pyhd8ed1ab_5 conda-forge prompt-toolkit 3.0.33 pyha770c72_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pygments 2.13.0 pyhd8ed1ab_0 conda-forge pynndescent 0.5.8 pyh1a96a4e_0 conda-forge pyopenssl 22.1.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pyqt 5.9.2 py39h2531618_6 anaconda pysocks 1.7.1 py39hf3d152e_5 conda-forge pytest 7.2.0 pyhd8ed1ab_2 conda-forge python 3.9.15 haa1d7c7_0
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2022.6 pyhd8ed1ab_0 conda-forge qt 5.9.7 h5867ecd_1
readline 8.2 h5eee18b_0
requests 2.28.1 pyhd8ed1ab_1 conda-forge rosella 0.4.2 h6f8cb4c_1 bioconda samtools 1.16.1 h6899075_1 bioconda scikit-bio 0.5.7 py39hce5d2b2_0 conda-forge scikit-learn 1.0.2 pypi_0 pypi scipy 1.8.1 pypi_0 pypi seaborn 0.12.1 hd8ed1ab_0 conda-forge seaborn-base 0.12.1 pyhd8ed1ab_0 conda-forge setuptools 65.5.0 py39h06a4308_0
sip 4.19.13 py39h295c915_0 anaconda six 1.16.0 pyh6c4a22f_0 conda-forge sqlite 3.39.3 h5082296_0
stack_data 0.6.1 pyhd8ed1ab_0 conda-forge starcode 1.4 hec16e2b_2 bioconda statsmodels 0.13.5 py39h2ae25f5_2 conda-forge tbb 2021.7.1 pypi_0 pypi threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tk 8.6.12 h1ccaba5_0
tomli 2.0.1 pyhd8ed1ab_0 conda-forge tornado 6.2 py39hb9d737c_1 conda-forge tqdm 4.64.1 pyhd8ed1ab_0 conda-forge traitlets 5.5.0 pyhd8ed1ab_0 conda-forge typing_extensions 4.4.0 pyha770c72_0 conda-forge tzdata 2022f h04d1e81_0
umap-learn 0.5.3 py39hf3d152e_0 conda-forge unicodedata2 15.0.0 py39hb9d737c_0 conda-forge urllib3 1.26.11 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge wheel 0.37.1 pyhd3eb1b0_0
xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xz 5.2.6 h5eee18b_0
zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge