mtw / viRNA

A collection of conserved viral RNA structures
GNU Affero General Public License v3.0
3 stars 0 forks source link

Alignment with a single sequence #1

Closed AntonPetrov closed 2 years ago

AntonPetrov commented 2 years ago

@mtw Hi Michael!

I've randomly come across your repo and it looks like a great resource! Do you think these alignments should be integrated in Rfam? As you know, we have recently included some new Flavivirus models into Rfam but it seems like you have some additional elements here.

I also noticed a file with a single sequence (not sure if it technically counts as an alignment). Is this an error or just a super niche structure?

mtw commented 2 years ago

Hi @AntonPetrov,

The primary idea for this repository was to have a publicly accessible place to deposit structures that we have been describing in some of our recent publications. At the moment, the repository only contains conserved UTR elements found in the genus Flavivirus, however, we have a bunch of elements of other virus families in the making.

Many of the elements featured here are subsets of more general Rfam families, such as the SL-II, SL-IV, DB, and the 3'SL elements. We have worked out pairwise proximities between conserved elements for particular ecological or serological groups of Flaviviruses. The single element that you have been referring to is the only instance of the orthologous exoribonuclease-resistant SL elements in the Aroa virus (AROAV) group. Hence we keep it in its own Stockholm file, although the AROAV SL-IV element appears to be more similar to the Dengue SL-II than the AROAV SL-II. I guess this level of description is too fine-grained to be suitable for Rfam.

As you mentioned, there are additional elements that we have recently found in a computational screen of different subtypes and lineages of Tick-borne encephalitis virus. These could be candidates for Rfam, however, these elements we all found by in silico screening, i.e. there is currently no experimental data available.

AntonPetrov commented 2 years ago

Hi @mtw!

Thank you for getting back to me! It's really cool that you host all the alignments here on GitHub. The preprint is very nice, I especially like the figures - very clear and well-designed.

I converted the TBEV sequences into FASTA and scanned them with all the Rfam Flavivirus models and only 30% (122/305) have hits (from Flavi_TBFV_CRE and Flavi_TBFV_xrRNA), so there is an opportunity to create some new families - perhaps once the paper is published.

The lack of experimental data could be an issue but I will leave it to @blakesweeney to decide. Blake is the new Rfam tsar - please feel free to reach out to him with any comments or suggestions for Rfam.

I will close this issue for now then. Thanks again!