tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
843 stars 226 forks source link

Prokka GFF files incompatible #498

Closed mabouelk closed 4 years ago

mabouelk commented 4 years ago

Hi Prof. Seemann,

I am trying to run AMRFinderPlus (NCBI) on multiple samples but they said prokka .gff file are not compatible. I tried that and it did work perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' >

but I do not how to do it on all files at once (batch run).

tseemann commented 4 years ago

Can you provide a specification or example of what the AMRFinderPlus expects? I thought AMRfinderPlus took the .fna file not the .gff file?

mabouelk commented 4 years ago

amrfinder -p test_prot.fa -g test_prot.gff -n test_dna.fa -O Campylobacter yes, it can run using .fna file only but to automatically combine overlapping results from protein and nucleotide searches the coordinates of the protein in the assembly contigs must be indicated by the GFF file. This requires a GFF file where the value of the 'Name=' variable of the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). See the section on GFF file format for details of how AMRFinderPlus associates FASTA file entries with GFF file entries.

they mentioned on their page

Prokka GFF files incompatible Using GFF files included with Prokka does not work because the format is different from what AMRFinderPlus expects. Running the following perl one liner will convert the Prokka output into a GFF file that AMRFinderPlus can read (replace with the GFF file you wish to use and with the name you wish to use for the AMRFinderPlus-compatible GFF file): perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' >

so I have multiple files and I do not know how to do it on all files at once (batch run)?

tseemann commented 4 years ago
perl -pi.bak -e '...............' *.gff

will copy all the gff to .gff.bak and replace .gff with the fixed one https://www.perlmonks.org/?node_id=608701