merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Renaming all bins when your refinement is done and collection final #299

Closed tdelmont closed 8 years ago

tdelmont commented 8 years ago

Hi, now we have more and more "final genomic collections" identified using anvi'o. Most of my bins are named like "Bin_137_1_2" or "Bin_0_2". I feel it would be great to rename the bins in a more logical and more instructive manner before sharing it with people.

Here is my suggestion: a command line to rename bins using a Project name, then bins are named by numbers based on length: In project called "Anvio_is_the_best", then largest bin would be "Anvio_is_the_best_MAG001", and so one...

What do you think??? Would look more organized, no?

Tom

tdelmont commented 8 years ago

Meren suggested we should not loose the link between bins and the "master" CONCOCT cluster, when using CONCOCT for pre-clustering. I suggest we rename bins and even rename and organize contigs by length, but keep the assembly contig name and master cluster number in the header of contigs, as an additional information.

One example could be:

"Anvio_is_the_best_MAG001_contig001 [Assembly contig_K03245] [Master cluster bin_03]" ACCTTGGATCG "Anvio_is_the_best_MAG001_contig002 [Assembly contig_K435245] [Master cluster bin_03]" ATTGCTT

It is the way NCBI includes metadata for contigs (e.g., > contig_393956 [mitochondria]) but maybe a space is problematic for us?

If the assembly, and the CONCOCT clustering table are available, then people can link everything, yet the genomic collection is in a better shape.

Tom