Rosella recover flight status error #30

Closed Rridley7 closed 1 year ago

Rridley7 commented 1 year ago

Hi, I am running into an error when running rosella recover with several metagenomes. The error is not consistent between samples, e.g. I cannot predict when the error will occur, however it does happen consistently on the samples with which it occurs. The error statement is:

Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256)) thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14

The original command was: rosella recover -i S04_1a9817_spa_t_mtb_cov.txt -r S04_1a9817_spa_t_contigs.fa

I can provide the original files if needed, the coverage file was generated by use of coverm contig in metabat mode.

rhysnewell commented 1 year ago

Hmm, yeah that error is not very informative. Would you please provide the reference file and coverage file? I'll see if I can get to the bottom of it


Rridley7 commented 1 year ago

Files are attached, thanks! S04_1a9817_spa_t_mtb_cov.txt S04_1a9817_spa_t_contigs.fa.zip

rhysnewell commented 1 year ago


So i've looked through your files and tried running Rosella on them. You are right that rosella does error out, but I believe it is not due to a problem on rosella's end.

The assembly you are trying to bin is not very good, here are the stats from bbmap for it:

A   C   G   T   N   IUPAC   Other   GC  GC_stdev
0.2519  0.2479  0.2451  0.2551  0.0000  0.0000  0.0000  0.4930  0.0964

Main genome scaffold total:             1828
Main genome contig total:               1828
Main genome scaffold sequence total:    2.745 MB
Main genome contig sequence total:      2.745 MB    0.000% gap
Main genome scaffold N/L50:             663/1.435 KB
Main genome contig N/L50:               663/1.435 KB
Main genome scaffold N/L90:             1562/1.067 KB
Main genome contig N/L90:               1562/1.067 KB
Max scaffold length:                    15.637 KB
Max contig length:                      15.637 KB
Number of scaffolds > 50 KB:            0
% main genome in scaffolds > 50 KB:     0.00%

Minimum     Number          Number          Total           Total           Scaffold
Scaffold    of              of              Scaffold        Contig          Contig
Length      Scaffolds       Contigs         Length          Length          Coverage
--------    --------------  --------------  --------------  --------------  --------
    All              1,828           1,828       2,745,043       2,745,043   100.00%
    500              1,828           1,828       2,745,043       2,745,043   100.00%
   1 KB              1,828           1,828       2,745,043       2,745,043   100.00%
 2.5 KB                 99              99         375,466         375,466   100.00%
   5 KB                 15              15         115,390         115,390   100.00%
  10 KB                  3               3          38,314          38,314   100.00%

As you can see, most of the contigs fall below the default minimum contig size that rosella uses (--min-contig-size 1500). The size of the assembly of contigs > 1Kbp is less than 500Kbp. That's not really a whole lot of information for rosella, or any binning algorithm, to work with. I doubt you will easily get anything informative out of this assembly without some level of manual inspection.

I think I will go ahead and close this issue now. Hopefully you have found my response helpful, and you can find something useful in your assembly.

Cheers, Rhys

janfelix commented 1 year ago

Hello Rhys, I had the same issue and most likely due to the same problem with short contigs. My contigs are assembled from metatranscriptome data, so that's what they are. I had the impression that GroopM was able to process contigs as short as 500bp and then moved on to rosella. Do you see any chance rosella could work with contigs shorter than 1500 bp? Even just to try it out or by only using read coverage...

Thanks again for building rosella and the great support!

rhysnewell commented 1 year ago

You can certainly try it out, you just have to set --min-contig-size to the desired value and see how you go. If it returns and error again, then let me know. Thanks for trying it out :)

You'll probably also want to alter --min-bin-size as well and drop it down to a much lower value if you expect your metaT bins to small

Rridley7 commented 1 year ago

This was certainly helpful, thanks!

janfelix commented 1 year ago

Hi, I have tried a few things, contig size and bin size were lowered. Unfortunately, after successfully completing the "Contigs kmers analyzed" part it crashes:

[00:07:56] ⠋ Calculating UMAP embeddings and clustering... 3/6
[2022-11-24T22:17:10Z ERROR bird_tool_utils::command] Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256)) thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14

Not sure what that could mean...

rhysnewell commented 1 year ago

Would you please be able to post the output of conda list for your rosella conda environment?

janfelix commented 1 year ago

Hi, thanks for looking into this!

packages in environment at /home/jan/.conda/envs/rosella:


Name Version Build Channel

