songlab-cal / gpn

Genomic Pre-trained Network
https://doi.org/10.1073/pnas.2311219120
MIT License
186 stars 24 forks source link

Calculation of AUPRC in GPN-MSA Figure 2b #26

Closed yangzhao1230 closed 7 months ago

yangzhao1230 commented 7 months ago

I'm attempting to reproduce the results shown in Figure 2b, but the AUPRC values I'm calculating seem odd. I've been using the scores provided by your Hugging Face implementation. Could you provide a simple code snippet for replicating the results in Figure 2b?

I have demonstrated my calculation process in a self-contained Colab notebook, which you can access here: Colab Notebook Link. Could you please take a look and let me know if there's anything I'm missing?

gonzalobenegas commented 7 months ago

Hello! I believe you just need to flip the sign of the scores. Lower means more deleterious, so scores are anti-correlated with label. Apologies that this is not documented.

BTW, songlab/cosmic is for the upcoming v2 of the manuscript, with slight differences from v1.

yangzhao1230 commented 7 months ago

Whether or not the labels are flipped, the results are very strange. image image

yangzhao1230 commented 7 months ago

Functions I used to calculate the results are from sklearn from sklearn.metrics import precision_recall_curve, auc, average_precision_score

gonzalobenegas commented 7 months ago

See edited notebook: link

I flipped the scores instead of the labels.