ROC AUC Score - Githubissues

monilouise commented 2 years ago

Hi,

Did you implement any way to measure ROC AUC score for NER? If not, why?

I'm trying to figure out how to add this metric to the code...

Thanks in advance.

fabiocapsouza commented 2 years ago

Hi @monilouise,

Unfortunately, we did not implement ROC AUC because it is not used by the evaluation dataset we used, but it would be an interesting metric to have.

Regarding how to implement it, I believe the major change is to add a way to gather the tag probability distribution for every token instead of the predicted class index with argmax as we currently do. For that we can use the OutputComposer class to "undo" the windowing that is performed in the preprocessing and combine the predictions of many windows into a single tensor for each input example.

The evaluate function receives an output_composer that combines the predicted class indices y_true. One way is to add another OutputComposer to do the same thing for the probabilities:

# create an OutputComposer similar to existing validation/evaluation composers

probs_output_composer = OutputComposer(
            eval_examples,
            eval_features,
            output_transform_fn=None)  # <--- We do not want to modify the outputs

# add new arguments and pass them to evaluate function
def evaluate(..., probs_output_composer, roc_auc_computer):
    (...)
    outs = model(...)
    (...)
    logits = outs['logits']  # it will only work for models without CRF layer
    probs = F.softmax(logits, axis=-1)  # (batch_size, max_length, num_classes)
    probs_output_composer.insert_batch(example_ixs, doc_span_ixs, probs)

    # Now we can a list of probabilities tensors by calling the `get_outputs()` method.
    # N lists of shape (example_length, num_classes)
    all_probs = probs_output_composer.get_outputs()
    # Compute ROC AUC score and add it to metrics output dict
    roc_auc_score = roc_auc_computer(y_true, all_probs)
    return metrics

Another problem is that inside evaluate the labels are tag strings instead of class indices. Assuming you would need class indices for the labels, you would have to use NERTagsEncoder to convert them to indices (that is why I suggested adding the roc_auc_computer argument in evaluate as well). The other metrics use tags directly, so it uses OutputComposer.output_transform_fn to convert y_pred into tags.

Could you please share how you plan to compute the ROC AUC score? I haven't used ROC AUC for multiclass problems myself, so I'm curious how it's done.

monilouise commented 2 years ago

Hi @fabiocapsouza

I plan to compute ROC AUC for each class by using one vs rest strategy. There's an implementation available for multiclass problems in https://huggingface.co/spaces/evaluate-metric/roc_auc. One vs one is another possibility.

neuralmind-ai / portuguese-bert

ROC AUC Score #44