uci-cbcl / UFold

MIT License
58 stars 26 forks source link

How can I get the complete RNAStralign dataset? #17

Closed llfzllfz closed 1 year ago

llfzllfz commented 1 year ago

Hello, I get the RNAStralign in Mxfold, and I make them in same folder to train the model. But I find that just have 20879 files, and with the e2efold paper describe that the RNAStralign should have 37149 files. So I want to know how can I get the complete RNAStralign dataset. Thanks.

llfzllfz commented 1 year ago

Also, when I use the 20879 files to train, I find that some file has some issues. For example, I find some file cann't open or some file miss some fields.

sperfu commented 1 year ago

Yes, The dataset from mxfold2 paper contains less number of files because they get rid of sequences longer than 600bp, while e2efold contains all of them. So you may check e2efold paper to retrieve all the files. I'm not aware of your problem in file format issue. It should be opened througn any text editor.