tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
831 stars 226 forks source link

Keep original contig name #183

Open ajsinghal opened 8 years ago

ajsinghal commented 8 years ago

Is there a way to make the locus tag be the original contig name like RAST does?

tseemann commented 8 years ago

If you only have one contig then you can set the --locustag parameter to be the contig name yourself (and don't use --compliant which renames contigs).

If you have lots of contigs, then I could see how that might be a useful feature. However many contig names are not legal locus_tags, for example SPAdes and Velvet.

I'll leave this open as a possible enhancement.

thierryjanssens commented 5 years ago

This enhancement would be instrumental as well to analyse scaffolds from a fragmented metagenome and parse the resulting annotations. I understand that prokka is not designed for annotating genetic material from metagenomes, but in fact is the most versatile tool to achieve this if you subset to superkingdoms first.

SilentGene commented 5 years ago

Hi @tseemann It would much better if we can know which contig does a gene come from according to its sequence header in fasta format or 'locus_tag' in Genbank format. Also I'm keen to know the location of a gene in the original contig. So I would suggest keeping the original contig ids (or part of them) in the ORF identifier and increment them from every first gene in each contig. Specifically, It would be like this ideally:

>contig1_1 hypothetical protein
ATCG....
>contig1_2 hypothetical protein
ATCG....
>contig1_3 hypothetical protein
ATCG....
>contig2_1 hypothetical protein
ATCG....

This would be quite helpful when people are trying to understand the location of a gene in the genome. Thanks!

tseemann commented 5 years ago

@SilentGene given your github username i would have thought you would want to keep your gene source private! ;-)

I think the best way to solve this would be to have a customisable --locustag PATTERN option, where PATTERN could have codes in it like --locustag "{{contig}}_{{ftype}}_{{genenum}}" etc which would give things like contig001_CDS_123 and contig245_rRNA_3 for example.

SilentGene commented 5 years ago

Haha;-)😆 That would be awesome if we could customize the locustag by patterns like that. Can't wait to try out the new feature!

cfrioux commented 4 years ago

Hi @tseemann, is a cutomisable --locutag an option planned for future versions of Prokka? Thanks!

mkazanov commented 4 years ago

This feature would be very important!

agavriilidou commented 3 years ago

Hi @tseemann! Any updates on this? It seems that more and more people are using prokka for annotating metagenomes and it is indeed important to know in which contig the genes are found. :)

JuanmaMedina commented 3 years ago

I also encourage you to implement this feature. And thanks for the amazing work!