merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
431 stars 145 forks source link

[FEATURE REQUEST] Show genome size stats (mean and stdev) when binning with anvi-refine #1888

Open sklasek opened 2 years ago

sklasek commented 2 years ago

The need

When manually binning MAGs, I often find myself checking the GTDB database to see whether the size of the MAG I've selected is within a plausible range of other members in that genus (generally it is). Sometimes one may identify several contigs that represent a ~50% complete MAG with a genus-level taxonomy estimation, but its total length is far less than 50% of the genome size of all members of that genus. Maybe you truly have an outlier and you decide to save the MAG, but in other instances you may want to be a bit conservative.

The solution

Include mean genome size and standard deviation in the bins tab of the interactive interface for bins that are identified at a certain taxonomic threshold (Genus maybe?) Probably only appropriate for genera represented by many isolates and/or MAGs of high quality. On the other hand... deciding which GTDB search criteria to include might be kind of tricky.

Beneficiaries

Those of us using manual refinement for genome-resolved metagenomics

meren commented 2 years ago

Hey @sklasek, this is a VERY VERY good idea!! And it CAN be done :)

I hope we can get to this sooner than later. I am marking this as priority.

sklasek commented 2 years ago

Awesome, thanks @meren!