Porgen has been applied to generate proteins for CM and MDH families. In the Method section, the details are described as:
We computed the AUC in receiver operating characteristic (ROC) curves for predicting binary function labels from model scores. We computed model scores for each sequence in both CM and MDH by using the per-token model log-likelihood in Eq. 2.
Does this mean: (1) for each sequence the log-likelihood is calculated for each token and (2) then a classifier model is employed to predict whether the whole sequence is reactive or not (the label is from experimental data). The features are the calculated log-likelihood score for each token. Could you please also release data/codes/models for this part?
Hi, thank you for the beautiful work.
Porgen has been applied to generate proteins for CM and MDH families. In the Method section, the details are described as:
Does this mean: (1) for each sequence the log-likelihood is calculated for each token and (2) then a classifier model is employed to predict whether the whole sequence is reactive or not (the label is from experimental data). The features are the calculated log-likelihood score for each token. Could you please also release data/codes/models for this part?
Best regards