mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
109 stars 27 forks source link

No gene name in unitig annotation #133

Closed samlipworth closed 3 years ago

samlipworth commented 3 years ago

Trying to annotate significant unitigs.

I annotated a genome (complete) using prokka then used the fasta and gff with annotate_hits_pyseer

this works and the script runs ok but there are no gene names in the output file. Can email relevant files if that helps? Many thanks, Sam

mgalardini commented 3 years ago

Hi, yes a way to reproduce the issue would be helpful, thanks

samlipworth commented 3 years ago

emailed to you - thanks

mgalardini commented 3 years ago

Received them thanks! Will reply as soon as I have a bit of time

mgalardini commented 3 years ago

The problem is that your Fasta file header should have the same sequence id as the one found in the GFF file:

$ head -n 1 R00000049.fasta
>R00000049 gi|150953431|gb|CP000647.1| Klebsiella pneumoniae subsp. pneumoniae MGH 78578, complete sequence

$ head -n 3 R00000049.gff
##gff-version 3
##sequence-region gnl|X|CKMJCFNP_1 1 5315120
gnl|X|CKMJCFNP_1        prokka  gene    340     2802    .       +       .       ID=CKMJCFNP_00001_gene;Name=thrA;gene=thrA;locus_tag=CKMJCFNP_00001

If you change the FASTA header to: >gnl|X|CKMJCFNP_1 gi|150953431|gb|CP000647.1| Klebsiella pneumoniae subsp. pneumoniae MGH 78578, complete sequence the annotation should work.

samlipworth commented 3 years ago

Ah that makes sense - thankyou very much.