How can I generate the training datasets?

uci-cbcl / UFold

MIT License

59 stars 29 forks source link

How can I generate the training datasets? #14

Open llfzllfz opened 2 years ago

llfzllfz commented 2 years ago

I've download the RNAStralign from the mxfold2, and it has 8 subfolders. With your code in process_data_newdataset.py, I just find the os.listdir(), and it can't solve the subfolders. So what should I do to generate the training datasets? Thanks.

sperfu commented 2 years ago

Hi there,

It depends on how you would like to deal with these data. In our work, we merged all these files in the RNAStralign dataset into one folder and use all the dataset for training. If you choose to check the performance on various species, you may need to use these separated subfolders as illustrated in e2efold paper. So all in all, it depends on how you would like to operate.

Thanks.