jarojasva / HADEG

Repository of sequences to annotate hydrocarbon-degrading genes/proteins from genomes/metagenomes
GNU General Public License v3.0
10 stars 1 forks source link

Missing .fasta Files for Amino Acid Sequences in Database Tables #2

Closed DougFelipe closed 3 months ago

DougFelipe commented 3 months ago

Hello @jarojasva!

While working on the database, I observed that some of the .fasta files containing amino acid sequences (such as Q6RXW0,Q6RXW1, Q6RXW2, Q6TMA3) reported in the tables 1_Aerobic_alkane_degradation_pathways_and_genes are missing.

I am not sure if they were removed and I did not notice, but I wanted to bring this to attention.

Regardless, I appreciate the work done on the database; it has been very useful.

jarojasva commented 3 months ago

Hi @DougFelipe

Thank you for your observation. Indeed, the individual amino acid sequences of those proteins cannot be found in the directories. There may be others missing as well; we are currently reviewing this. However, all HADEG proteins (including these) are available in the HADEG_protein_database_231119.faa file.

We are currently working to update the database, expanding it and exploring other annotation methods to be more accurate in our predictions.

Thank you for your understanding.

Best regards, Jorge