tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
822 stars 224 forks source link

Improve lists of Rfam domain-specific models #622

Open AntonPetrov opened 2 years ago

AntonPetrov commented 2 years ago

Hello,

I noticed that prokka contains the lists of domain-specific Rfam covariance models, which is very useful and helps focus just on the relevant RNAs. However, the SQL queries used to generate the lists are likely to create false positives.

For example, the bacterial list should not contain Metazoan signal recognition particle RNA, Small nucleolar RNA U3, Small nucleolar RNA SNORD36 because these families do not occur in bacteria.

The Rfam team maintains rfam-taxonomy, a dedicated repo with lists of Rfam models for Bacteria, Eukaryotes, Viruses, and other domains. The lists are updated every Rfam release and an archive is available.

Would it be possible to use the Rfam lists in prokka?

Please let me know if you have any questions. I am also tagging @blakesweeney who will soon be replacing me as Rfam Project Leader. Many thanks in advance for looking into this!