ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

ORF prediction issue #308

Closed tangwei825 closed 1 month ago

tangwei825 commented 1 month ago

Hi, I am using the pgap 2024-04-27.build7426 to predict a genome of Staphylococcus simulans on our HPC (using singularity instead of docker). The pipeline works perfectly.

But one issue is that the CDS prediction is not so good for my target protein. My protein contains 433 aa but pgap prediction is 268 aa (lack of one protein domain). Is there any suggestion for this?

My protein is a homology of AgrC protein in Staphylococcus aureus quorum sensing system. The pgap annotated it as "GHKL domain-containing protein".

Thank you!

azat-badretdin commented 1 month ago

Thank you for your report, user @tangwei825 !

We opened internal investigation for this data issue. One of our curators will have a look at this in a timely manner.

koneill54 commented 1 month ago

Thank you for your input. We have modified the data we use to determine the structural annotation of this protein and created a new protein family model for the AgrC protein which should correct the functional annotation. These changes will be in the next release of PGAP which should be happening shortly. Thank you for this information.

tangwei825 commented 1 month ago

Thank you guys for your response and updating the database! Looking forward to the new release.