tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
822 stars 224 forks source link

Retrieving additional uniprot codes #568

Open gundizalv opened 3 years ago

gundizalv commented 3 years ago

Hi dear

Looking at your prokka-uniprot_to_fasta_db script, I wonder if I can retrieve KEGG accession genes instead of COG. There are more than 20 millon entries in trembl with linked KEGG genes while barely a half for COG.

This is your piece of code:

my $ec = ''; 
  my $prod = ''; 
  my $cog = '';

  if (1) {
    # [ 'eggNOG', 'COG4799', 'LUCA' ]
    for my $dr ( @{ $entry->DRs->list } ) {
#      print Dumper($dr);
      if ($dr->[1] =~ m/^(COG\d+)$/) {
        $cog = $1;
        last;
      }   
    }
  }

... Instead of matching the DR line 'eggNOG', 'COG...', 'bacteria' and retrieving COG accession, Is there any possibility to match the DR line 'KEGG'; 'vg:2947773'; - (example). and pick up the KEGG code? What I should modify? They are too useful to me.