Open aljazdzy opened 10 months ago
Hello @aljazdzy, Apologies for delayed response here, I've been on break.
I'll add some documentation in future to clarify these outputs, but yes your deductions are all correct. The refined bins are normal output from rosella, they were just produced from putative bins (that aren't included in the final output) during the initial round of refinement that rosella performs. They have the "refined" tag just to show that they were produced from the second step and not the first.
I am kind of confused about that bin called "single_contig" that has 3 contigs in it, when reviewing the code that shouldn't necessarily happen. Would you be able to provide some additional information about it? Like does it look like a legitimate bin or is it highly contaminated?
Additionally, in general were the bins of similar quality to previous rosella runs? This update is a fairly large refactor to address some speed issues but I'm working on this in my spare time, so I just want to make sure I haven't missed any bugs.
Cheers, Rhys
No issues, I apologize for my delay as well! Yes when I run those contigs through checkm2 I get output that looks like this:
rosella_refined_0_single_contig_refined_0_1 93.4 43.98 Gradient Boost (General Model) 11 0.89 562115 289.4317697228145 3648733 0.47 3752 None
rosella_refined_0_single_contig_refined_0_2 68.71 8.97 Gradient Boost (General Model) 11 0.903 712182 308.4401823015572 2688388 0.52 2633 None
rosella_refined_0_single_contig_refined_0_3 44.11 3.47 Gradient Boost (General Model) 11 0.868 298476 305.5108924806746 1498363 0.45 1423 None
rosella_refined_0_single_contig_refined_0_4 28.29 1.04 Neural Network (Specific Model) 11 0.916 281267 336.65242165242165 772218 0.44 702 None
rosella_refined_0_single_contig_refined_0_5 99.99 7.88 Neural Network (Specific Model) 11 0.886 2946550 303.839142948513 3206005 0.47 3127 None
rosella_refined_0_single_contig_refined_0_6 17.88 0.17 Neural Network (Specific Model) 11 0.89 254714 325.07439824945294 499452 0.43 457 None
rosella_refined_0_single_contig_refined_0_7 11.84 0.03 Neural Network (Specific Model) 11 0.844 298451 308.6630036630037 298451 0.48 273 None
``
So some of them are quite decent but others have some significant issues. I would like to say the bins were in general of higher quality that previous runs, but I'll admit I don't have a quantitative analysis on that quite yet. My bins in general haven't been of great quality but that's more-to-do with my data. I'm hoping to change that soon though.
This isn't really an issue as much as an output question: I just updated to the newest version of rosella (hadn't done so in a bit) and was excited to see a bunch of new outputs I didn't have previously! These include: rosella_refined0"number" rosella_refined_0_single_contig_refined0"number" rosella_refined_0_unbinned rosella_bin_small_unbinned rosella_bin_unbinned
Most of them are pretty obvious as to what they are - I would think rosella_refined_0_unbinned would have unbinned contigs above a certain threshold, then "small_unbinned" would contain contigs below that threshold. I also would hypothesize that "single_contig_refined" contains maybe very large contigs that didn't have many otherwise clusters? But are clustered together? (I realize it says "single contig" but when I open the files they seem to contain at least 3 very large contigs). I'm not entirely sure what the "refined0"number"" bins are though, are these refined versions of the original output bins? I ran recover but is the program also running refine?
Any clarification would be greatly appreciated, I'm excited for the extra bit of data!