merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Warning for the names in fasta.txt for the Pangenomics workflow #1103

Closed FlorianTrigodet closed 5 years ago

FlorianTrigodet commented 5 years ago
(anvio-master) -bash-4.2$ anvi-self-test -v
Anvi'o version ...............................: margaret (v5.3-master)
Profile DB version ...........................: 31
Contigs DB version ...........................: 12
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

Hi,

I was using the Pangenomic workflow and had some errors that were due to the the names I have written in the fasta.txt file.

These names are used to reformat the headers of the contigs fasta files as mentioned in the Snakemake: (small typo there, but no judging)

This is required to make sure taht the headers don't contain
any charachters that anvi'o doesn't like.It give contigs
meaningful names; so that if the group name is 'MYSAMPLE01', the
contigs would look like this:
> MYSAMPLE01_000000000001
> MYSAMPLE01_000000000002

Very sadly, I used the unwanted - in the names and that was the reason of the errors. Do you think it is worth to check these names ahead of the workflow and raise a warning ?

Cheers for the v5-3 ! Florian

ShaiberAlon commented 5 years ago

I think that is a good idea @FlorianTrigodet.

Thank you for reporting!