ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

[FEATURE REQUEST] <annotation output file with nucleotide sequence > #154

Closed Dongho1234 closed 3 years ago

Dongho1234 commented 3 years ago

HI, I have annotation output (faa file) with protein sequences. But is there anyway I can get annotation output with nucleotide sequences?

azat-badretdin commented 3 years ago

What do you mean by "annotation output with nucleotide sequences?"?

We have GFF3 output (see https://github.com/ncbi/pgap/wiki/Output-Files)

Dongho1234 commented 3 years ago

what i mean by that, for example 123 this is an output file (faa) with protein sequences, but instead, i want to get an output (protein products annotated on the genome in FASTA format) with nucleotide suequences. not amino acid seuqnce

Ex)

gnl|extdb|pgaptmp_001860 D-alanyl-D-alanine carboxypeptidase/D-alanyl-D-alanine-endopeptidase [Bacillus subtilis] AAAATTGGGCCCCCC~~~~

azat-badretdin commented 3 years ago

GFF3 file contains location of proteins on nucleotides. It can be used to produce nucleotide substrings.

thibaudnis commented 3 years ago

As Azat wrote, your best bet is to get the coordinates from the GFF and then extract the corresponding nucleotide sequences from the genomic fasta. We will consider your request for a nucleotide file of annotated features in future versions of PGAP though.