ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

[FEATURE REQUEST] Custom database for annotation #263

Closed zinque closed 11 months ago

zinque commented 1 year ago

Dear PGAP, Are there any plans to implement a feature to enable users to also provide a custom sequence database for annotation purposes? (equivalent to the --proteins option in prokka) Alternatively, do you have a suggestion how/where in the PGAP pipeline one could implement a custom sequence database? Apologies if this request has been raised previously, although I couldn't find it.

azat-badretdin commented 1 year ago

Thank you for your report, user @zinque

Right now there is no easy way to plug in difference reference data since information about specific reference entries (proteins, for example) is distributed among several files.

do you have a suggestion how/where in the PGAP pipeline one could implement a custom sequence database?

It is possible to swap it right now and we have weekly updates for internal usage. I am not sure what would be our LOE to do that?

But it's an interesting question.

zinque commented 1 year ago

It is possible to swap it right now and we have weekly updates for internal usage. I am not sure what would be our LOE to do that?

I don't know if the question was addressed to me or internally, but we could be willing to spend some effort in integrating a custom database.

azat-badretdin commented 1 year ago

Internally. My apologies for confusion.

We will discuss it internally at the earliest opportunity (summer is traditionally slower than other seasons because of vacations). Thank you for your request!