Open TerminatorJ opened 1 month ago
Yes, I remember there was a significant gap(at least -0.002 in terms of both training/val loss) between them. It was also reported among other winning teams. And It turned out that you could use learnable pairwise representation instead of bpp matrix(while it would increase model complexity).
Thanks for quick reply. i am not sure why -0.002 can be treated as 'significant'. If my training loss is cross entropy, i have gap with 0.02 in between +bpp and -bpp, can i also say it is 'significant'?
Well, I used the term significant only because -0.002 in loss(MAE between predicted reactivity and the GT) cannot be achieved without novel approach in handling this data. While it's hard to answer its substantial significance, we can see that small amount of difference in loss can lead to better accuracy in predicting 'hard' secondary structure of RNAs(ex. detecting long range pseudoknot/tetraloop-receptor tertiary interaction). you can refer to the paper He, S et al. (2024). Ribonanza: deep learning of RNA structure through dual crowdsourcing. bioRxiv. https://doi.org/10.1101/2024.02.24.581671 for more details.
Thanks, you mentioned you tried the pre-trained model e.g.BERT to address this problem. Have you ever compared the loss between the mask language model between +bpp and -bpp cases?
No, I didn't consider handling bpps when experimenting with SSL methods(BERT-like MLM, Data2Vec).
Hi, do you remember where there was a great gap between the +bpp and -bpp model according to their training loss?