Predicting model for CM and MDH dataset

Tsinghua-gongjing commented 1 year ago

Hi, thank you for the beautiful work.

Porgen has been applied to generate proteins for CM and MDH families. In the Method section, the details are described as:

We computed the AUC in receiver operating characteristic (ROC) curves for predicting binary function labels from model scores. We computed model scores for each sequence in both CM and MDH by using the per-token model log-likelihood in Eq. 2.

Does this mean: (1) for each sequence the log-likelihood is calculated for each token and (2) then a classifier model is employed to predict whether the whole sequence is reactive or not (the label is from experimental data). The features are the calculated log-likelihood score for each token. Could you please also release data/codes/models for this part?

Best regards

donglg1309 commented 1 year ago

Could you please release the corresponding data for this generation of CM/MDH?

Best, Liguo

wenyuhaokikika commented 10 months ago

I can not understand what is GB1 (top100avg) ? how to calculate?

salesforce / progen

Predicting model for CM and MDH dataset #31