uci-cbcl / UFold

MIT License
58 stars 26 forks source link

How can I generate the training datasets? #14

Open llfzllfz opened 1 year ago

llfzllfz commented 1 year ago

I've download the RNAStralign from the mxfold2, and it has 8 subfolders. With your code in process_data_newdataset.py, I just find the os.listdir(), and it can't solve the subfolders. So what should I do to generate the training datasets? Thanks.

sperfu commented 1 year ago

Hi there,

It depends on how you would like to deal with these data. In our work, we merged all these files in the RNAStralign dataset into one folder and use all the dataset for training. If you choose to check the performance on various species, you may need to use these separated subfolders as illustrated in e2efold paper. So all in all, it depends on how you would like to operate.

Thanks.