training loss between +bpp and -bpp

hoyso48 / Stanford---Ribonanza-RNA-Folding-2nd-place-solution

MIT License

14 stars 3 forks source link

training loss between +bpp and -bpp #1

Open TerminatorJ opened 1 month ago

TerminatorJ commented 1 month ago

Hi, do you remember where there was a great gap between the +bpp and -bpp model according to their training loss?

hoyso48 commented 1 month ago

Yes, I remember there was a significant gap(at least -0.002 in terms of both training/val loss) between them. It was also reported among other winning teams. And It turned out that you could use learnable pairwise representation instead of bpp matrix(while it would increase model complexity).

TerminatorJ commented 1 month ago

Thanks for quick reply. i am not sure why -0.002 can be treated as 'significant'. If my training loss is cross entropy, i have gap with 0.02 in between +bpp and -bpp, can i also say it is 'significant'?

hoyso48 commented 1 month ago

Well, I used the term significant only because -0.002 in loss(MAE between predicted reactivity and the GT) cannot be achieved without novel approach in handling this data. While it's hard to answer its substantial significance, we can see that small amount of difference in loss can lead to better accuracy in predicting 'hard' secondary structure of RNAs(ex. detecting long range pseudoknot/tetraloop-receptor tertiary interaction). you can refer to the paper He, S et al. (2024). Ribonanza: deep learning of RNA structure through dual crowdsourcing. bioRxiv. https://doi.org/10.1101/2024.02.24.581671 for more details.

TerminatorJ commented 1 month ago

Thanks, you mentioned you tried the pre-trained model e.g.BERT to address this problem. Have you ever compared the loss between the mask language model between +bpp and -bpp cases?

hoyso48 commented 1 month ago

No, I didn't consider handling bpps when experimenting with SSL methods(BERT-like MLM, Data2Vec).