yikunpku / RNA-MSM

Nucleic Acids Research 2024:RNA-MSM model is an unsupervised RNA language model based on multiple sequences that outputs both embedding and attention map to match different types of downstream tasks.
https://aigene.cloudbastion.cn/#/rna-msm
MIT License
42 stars 4 forks source link

A question about the dataset of RNA secondary structure #2

Closed fuxuliu closed 1 year ago

fuxuliu commented 1 year ago

Hi, thanks for sharing this great work. As for the dataset used for RNA secondary structure was mentioned in the paper: "405 RNAs for TR1, 40 RNAs for VL1 and 39 RNAs for TS1, ..., We combined TS1 and TS2 to make the final independent test set with 70 RNAs (TS)". I downloaded the data set for RNA secondary structure from SPOT-RNA or SPOT-RNA2, but the number of samples did not match the number mentioned in RNA-MSM. So, I was wondering, did you update the data set or did I download it by mistake. Would you mind sharing this new data or the download link? Thanks.

meilanglang commented 1 year ago

The data set of RNA secondary structure is different from the SPOT-RNA or SPOT-RNA2, it is a new data set.

yikunpku commented 1 year ago

Hi, thanks for sharing this great work. As for the dataset used for RNA secondary structure was mentioned in the paper: "405 RNAs for TR1, 40 RNAs for VL1 and 39 RNAs for TS1, ..., We combined TS1 and TS2 to make the final independent test set with 70 RNAs (TS)". I downloaded the data set for RNA secondary structure from SPOT-RNA or SPOT-RNA2, but the number of samples did not match the number mentioned in RNA-MSM. So, I was wondering, did you update the data set or did I download it by mistake. Would you mind sharing this new data or the download link? Thanks.

We have update the training, validation, and testing datasets used for our downstream tasks, they are currently available to the public and can be downloaded. See the README-Data Preparation-Downstream Task Data for download link.

fuxuliu commented 1 year ago

I got it, thanks so much.