Closed emilyvansyoc closed 1 year ago
ROC AUC score is not defined when you only have one group truth label, and you can comment out and only print other metrics. This shouldn't affect the P value cutoff or model interpretation, since P value is computed against the training set rather than the queries.
Ok, thank you for your response! Could you clarify where to comment out this line? I have done the following and neither are working:
In src/CLEAN/infer.py:
if report_metrics: pred_label = get_pred_labels(out_filename, pred_type='_pvalue') pred_probs = get_pred_probs(out_filename, pred_type='_pvalue') true_label, all_label = get_true_labels('./data/' + test_data) pre, rec, f1, roc, acc = get_eval_metrics( pred_label, pred_probs, true_label, all_label) print(f'############ EC calling results using random ' f'chosen {nk_random}k samples ############') print('-' * 75) print(f'>>> total samples: {len(true_label)} | total ec: {len(all_label)} \n' f'>>> precision: {pre:.3} | recall: {rec:.3}' f'| F1: {f1:.3} |') #AUC: {roc:.3} ') print('-' * 75)
In src/CLEAN/evaluate.py:
`pre = precision_score(true_m, pred_m, average='weighted', zero_division=0) rec = recall_score(true_m, pred_m, average='weighted') f1 = f1_score(true_m, pred_m, average='weighted')
return pre, rec, f1#, roc, acc`
Hello,
We are attempting to determine the lowest P value range of a single protein sequence using the conda CLEAN install. I.e., the input CSV is one sequence with an EC number and identifier. When we run this using
infer_pvalue
with default parameters, it calculates results but gives the following error/warning and does not print the model fit statistics (recall, precision etc):ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
It seems that the AUC cannot be calculated on only one input sequence. Does this affect the P value cutoff or model interpretation? Does the
infer_pvalue
function depend on multiple queries in an input file? If multiple sequences are required, how should we interpret the predictions for the one query sequence of interest?Thanks so much for your help.