lulab / OligoFormer

GNU General Public License v3.0
8 stars 2 forks source link

Problems of datasets #8

Open luoda888 opened 1 week ago

luoda888 commented 1 week ago

For example, I don't seem to find any description of the cell line in hu.csv/mixed.csv. And to confirm, is the label the reaction remaining amount or the reaction amount?

byl18 commented 1 week ago

Thank you for your question. Details of cell lines can be found in each original paper. We have not currently used these cell lines info for training. For the raw data at data/unnorm/, the third column is the inhibition efficiency of siRNA, where an siRNA with this value > 0.7 is defined as an effective siRNA with a y of 1.

luoda888 commented 1 week ago

In other words, it can be understood that when y=1, it is the value at which the current siRNA can effectively silence mRNA, right? And is the value of siRNA the antisense or sense sequence?

byl18 commented 1 week ago

Right. All siRNAs in the dataset are antisense from 5' to 3'.

luoda888 commented 1 week ago

I have a big dataset for siRNA-mRNA prediction, but your model does not perform as well as ViennaRNA's docking feature.If you are interested, please contact me by email and we can discuss why there are differences in effects.

lblhandsome@gmail.com