Closed apredeus closed 6 years ago
I took the complete Rfam database and filtered out non-bacterial entries. Unfortunatelty the GFF files I was using to identify bacterial ones is no longer generated, and Alex Bateman at Xfam told me they won't be available, and admitted it is now a difficult process to achieve. So I haven't updated it for many years.
To make your own you need MSAs (multiple sequence alignments) of your RNA families. I did not need to do this, as Rfam already did it. You then build models from those MSAs with cmbuild
. It is not as simple as making a BLAST database unfortunately, although your MSA could have a single sequence in them I guess.
Also, the alignment shouldn't really be done with a DNA aligner (eg. clustal). cmalign
can be used herem but it is very computationally heavy. it also marks up the alignment (in Stockholm format) with distant RNA base relationships from predicted secondary structure.
Hello Torsten,
I was wondering how exactly is Rfam database made (what sequences does it include), and how would one go about creating a custom DB? I'm looking at Infernal manual, but it's a bit confusing.