Closed Sazan-Mahbub closed 8 months ago
Hi, @Sazan-Mahbub. Thank you for your interest in our work. Actually, the performances reported in this section of our preprint correspond to our fine-tuning models, whose backbone (RNA-FM) were trained together in the downstream task. However, we later updated them with our feature-based models where RNA-FMs were frozen during the training, leading to the degrading performances here. I have checked that your results are similar to but a little bit lower than ours, I think it may be caused by different threshold selection. Therefore, no worry about your metric computation, it is correct.
Hi, @Sazan-Mahbub. Thank you for your interest in our work. Actually, the performances reported in this section of our preprint correspond to our fine-tuning models, whose backbone (RNA-FM) were trained together in the downstream task. However, we later updated them with our feature-based models where RNA-FMs were frozen during the training, leading to the degrading performances here. I have checked that your results are similar to but a little bit lower than ours, I think it may be caused by different threshold selection. Therefore, no worry about your metric computation, it is correct.
Hi @mydkzgj,
Thank you for your reply and clarifications! This has been really helpful for us.
Using sigmoid on the output and thresholding at 0.5, I am now getting F1=0.672 for TS0 and F1=0.934 for ArchiveII600 (with the same backbones mentioned before). I hope these are closer to the actual ones.
Yeah, they're pretty much the same.
Hi,
Great work!
I am trying to reproduce the SS prediction results (attached image) for ArchiveII600 (3911 sequences) and TS0 (1305 sequences).
While I could exactly reproduce UFold's scores, I could not reproduce RNAFM's scores in the same way. I used the model weights for RNAFM from here.
I got the F1 score 0.666 for TS0, using "RNA-FM-ResNet_bpRNA.pth"; the paper reported 0.704. For ArchiveII600, I got 0.933 using "RNA-FM-ResNet_RNAStralign.pth"; the paper reported 0.941.
I was wondering if the evaluation in your paper was done differently than how UFold did it
I'd really appreciate any help. Thank you!