ahmedbahaaeldin commented 5 years ago

How can i use the confusion matrix for each class and the other metrics in this link https://github.com/kaushaltrivedi/fast-bert/issues/17 ??

Pawel-Kranzberg commented 5 years ago

E.g. add those metrics for multi-class (multi_label = False) problems to your script / notebook before initialisation of the metrics list:

from sklearn.metrics import roc_auc_score, f1_score, confusion_matrix, multilabel_confusion_matrix

def F1_macro(y_pred:Tensor, y_true:Tensor, average = 'macro', sample_weight = None):
    y_pred = np.argmax(y_pred, axis = 1) 
    return f1_score(y_true, y_pred, average = average, sample_weight = sample_weight)

def F1_micro(y_pred:Tensor, y_true:Tensor):
    return F1_macro(y_pred, y_true, average = 'micro')

## Metrics functions that need labels upfront
def confusion_matrix_overall(y_pred:Tensor, y_true:Tensor, labels:list = labels_list, sample_weight = None):
    y_pred = np.argmax(y_pred, axis = 1) #.numpy()
    #y_true = y_true.detach().cpu().numpy()
    return confusion_matrix(y_true, y_pred, labels = [i for i in range(len(labels))], sample_weight = sample_weight)

def confusion_matrix_by_class(y_pred:Tensor, y_true:Tensor, labels:list = labels_list, sample_weight = None, samplewise = False):
    y_pred = np.argmax(y_pred, axis = 1) 
    return multilabel_confusion_matrix(y_true, y_pred, labels = [i for i in range(len(labels))], sample_weight = sample_weight, samplewise = samplewise)

def roc_auc_score_by_class(y_pred:Tensor, y_true:Tensor, labels:list = labels_list, average = 'micro', sample_weight = None):
    y_pred = np.argmax(y_pred, axis = 1).numpy()
    y_true = y_true.detach().cpu().numpy()
    roc_auc_score_d = {}
    for i in range(len(labels)):
        lb = LabelBinarizer()
        y_true_i = y_true.copy()
        y_true_i[y_true != i] = len(labels) + 1
        y_true_i = lb.fit_transform(y_true_i)
        y_pred_i = y_pred.copy()
        y_pred_i[y_pred != i] = len(labels) + 1
        y_pred_i = lb.transform(y_pred_i)
        roc_auc_score_d[labels[i]] = roc_auc_score(y_true_i, y_pred_i, average = average, sample_weight = sample_weight)
    return roc_auc_score_d

def F1_by_class(y_pred:Tensor, y_true:Tensor, labels:list = labels_list, sample_weight = None):
    y_pred = np.argmax(y_pred, axis = 1) 
    F1_by_class_d = {}
    for i in range(len(labels)):
        F1_by_class_d[labels[i]] = f1_score(y_true, y_pred, average = 'micro', labels = [i]) # pos_label = i,
    return F1_by_class_d
    # return f1_score(y_true, y_pred, average = None)

JeevaGanesan commented 5 years ago

How can we adapt this to multi label problem? I am trying to get F1 score and ROC_AUC score for each label. I get this error "ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets" when i used the above snippet.

Pawel-Kranzberg commented 5 years ago

How can we adapt this to multi label problem? I am trying to get F1 score and ROC_AUC score for each label. I get this error "ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets" when i used the above snippet.

I'll post the 'multi label' code in a few days, need to dig through my code repository,

Pawel-Kranzberg commented 5 years ago

How can we adapt this to multi label problem? I am trying to get F1 score and ROC_AUC score for each label. I get this error "ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets" when i used the above snippet.

E.g. add those metrics for multi-label (multi_label = True) problems to your script / notebook before initialisation of the metrics list:

from fast_bert.metrics import roc_auc, accuracy_thresh, fbeta # accuracy_multilabel,
from sklearn.metrics import hamming_loss, accuracy_score, roc_curve, auc, roc_auc_score, f1_score, multilabel_confusion_matrix
from torch import Tensor

threshold = 0.3

### (...)

### Metrics functions:
def Hamming_loss(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, sample_weight = None, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return hamming_loss(y_true, y_pred, sample_weight = sample_weight)

def Exact_Match_Ratio(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, normalize:bool = True, sample_weight = None, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return accuracy_score(y_true, y_pred, normalize = normalize, sample_weight = sample_weight)

def roc_auc_score_macro(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, average = 'macro', sample_weight = None, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return roc_auc_score(y_true, y_pred, average = average, sample_weight = sample_weight)

def roc_auc_score_micro(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, **kwargs):
    return roc_auc_score_macro(y_pred, y_true, sigmoid = sigmoid, average = 'micro')

def roc_auc_score_by_label(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, **kwargs):
    return roc_auc_score_macro(y_pred, y_true, sigmoid = sigmoid, average = None)

def ROC_AUC_by_label(y_pred: Tensor, y_true: Tensor, sigmoid:bool = True, labels:list = labels_list, **kwargs):
    # Compute ROC curve and ROC area for each label
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    fpr = dict()
    tpr = dict()
    roc_auc = dict()    
    for i in range(len(labels)):
        fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_pred[:, i])
        roc_auc[labels_list[i]] = auc(fpr[i], tpr[i])
    return roc_auc

def F1(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, threshold:float = threshold, **kwargs):
    return fbeta(y_pred, y_true, sigmoid = sigmoid, thresh = threshold, beta = 1)

def F1_macro(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, average = 'macro', sample_weight = None, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return f1_score(y_true, y_pred, average = average, sample_weight = sample_weight)

def F1_micro(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, **kwargs):
    return F1_macro(y_pred, y_true, sigmoid = sigmoid, average = 'micro')

def F1_by_label(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, sample_weight = None, labels:list = labels_list, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return f1_score(y_true, y_pred, average = None)

def accuracy_by_label(y_pred: Tensor, y_true: Tensor, sigmoid:bool = True, thresh:float = threshold, normalize:bool = True, sample_weight = None, labels:list = labels_list, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    accuracies = {}
    for i in range(len(labels)):
        accuracies[labels_list[i]] = accuracy_score(y_true[:, i], y_pred[:, i], normalize = normalize, sample_weight = sample_weight)
    return accuracies

def confusion_matrix_by_label(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, sample_weight = None, samplewise = False, labels:list = labels_list, **kwargs):
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred > thresh).float()
    y_pred = y_pred.detach().cpu().numpy()
    y_true = y_true.detach().cpu().numpy()
    return multilabel_confusion_matrix(y_true, y_pred, labels = [i for i in range(len(labels))], sample_weight = sample_weight, samplewise = samplewise)

MajdMustapha commented 4 years ago

@Pawel-Kranzberg Pawel-Kranzberg Thank you for sharing the multi-label metrics code, when I try to run the trainer.fit() using these metrics I stumble upon this error, and I can't seem to figure out where which tensor is to be moved to .cpu(), can you help me with this?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-45-dbbd575bee7f> in <module>()
----> 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True)

11 frames
/usr/local/lib/python3.6/dist-packages/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, return_results, schedule_type, optimizer_type)
    454             # Evaluate the model against validation set after every epoch
    455             if validate:
--> 456                 results = self.validate()
    457                 for key, value in results.items():
    458                     self.logger.info(

/usr/local/lib/python3.6/dist-packages/fast_bert/learner_cls.py in validate(self)
    542         for metric in self.metrics:
    543             validation_scores[metric["name"]] = metric["function"](
--> 544                 all_logits, all_labels
    545             )
    546 

<ipython-input-42-7da4dcdee442> in F1_macro(y_pred, y_true, sigmoid, thresh, average, sample_weight)
     45     if sigmoid: y_pred = y_pred.sigmoid()
     46     y_pred = (y_pred > thresh).float()
---> 47     return f1_score(y_true, y_pred, average = average, sample_weight = sample_weight)
     48 
     49 def F1_micro(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True):

/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
   1097                        pos_label=pos_label, average=average,
   1098                        sample_weight=sample_weight,
-> 1099                        zero_division=zero_division)
   1100 
   1101 

/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight, zero_division)
   1224                                                  warn_for=('f-score',),
   1225                                                  sample_weight=sample_weight,
-> 1226                                                  zero_division=zero_division)
   1227     return f
   1228 

/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
   1482         raise ValueError("beta should be >=0 in the F-beta score")
   1483     labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1484                                     pos_label)
   1485 
   1486     # Calculate tp_sum, pred_sum, true_sum ###

/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
   1299                          str(average_options))
   1300 
-> 1301     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1302     present_labels = unique_labels(y_true, y_pred)
   1303     if average == 'binary':

/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     79     """
     80     check_consistent_length(y_true, y_pred)
---> 81     type_true = type_of_target(y_true)
     82     type_pred = type_of_target(y_pred)
     83 

/usr/local/lib/python3.6/dist-packages/sklearn/utils/multiclass.py in type_of_target(y)
    245         raise ValueError("y cannot be class 'SparseSeries' or 'SparseArray'")
    246 
--> 247     if is_multilabel(y):
    248         return 'multilabel-indicator'
    249 

/usr/local/lib/python3.6/dist-packages/sklearn/utils/multiclass.py in is_multilabel(y)
    136     """
    137     if hasattr(y, '__array__') or isinstance(y, Sequence):
--> 138         y = np.asarray(y)
    139     if not (hasattr(y, "shape") and y.ndim == 2 and y.shape[1] > 1):
    140         return False

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __array__(self, dtype)
    490     def __array__(self, dtype=None):
    491         if dtype is None:
--> 492             return self.numpy()
    493         else:
    494             return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

MajdMustapha commented 4 years ago

EDIT ; I believe adding

 y_true = y_true.detach().cpu().numpy()
 y_pred = y_pred.detach().cpu().numpy()

seems to solve the error.

However I keep getting f1 micro and macro scores both at 0 starting from epoch N°2.

05/26/2020 14:45:49 - INFO - root -   Running evaluation
05/26/2020 14:45:49 - INFO - root -     Num examples = 648
05/26/2020 14:45:49 - INFO - root -     Batch size = 16
 100.00% [41/41 00:12<00:00]
05/26/2020 14:46:01 - INFO - root -   eval_loss after epoch 1: 0.5465127723972972: 
05/26/2020 14:46:01 - INFO - root -   eval_F1_macro after epoch 1: 0.07030604952823473: 
05/26/2020 14:46:01 - INFO - root -   eval_F1_micro after epoch 1: 0.2791666666666666: 
05/26/2020 14:46:01 - INFO - root -   eval_roc_auc_score_macro after epoch 1: 0.5189993932963294: 
05/26/2020 14:46:01 - INFO - root -   eval_roc_auc_score_micro after epoch 1: 0.7176347017846504: 
05/26/2020 14:46:01 - INFO - root -   eval_accuracy_by_label after epoch 1: {'First Party Collection/Use': 0.6929012345679012, 'Third Party Sharing/Collection': 0.375, 'User Access, Edit and Deletion': 0.9629629629629629, 'Data Retention': 0.9783950617283951, 'Data Security': 0.9521604938271605, 'International and Specific Audiences': 0.8240740740740741, 'Do Not Track': 0.9907407407407407, 'Policy Change': 0.9614197530864198, 'User Choice/Control': 0.9259259259259259, 'Introductory/Generic': 0.8827160493827161, 'Practice not covered': 0.9058641975308642, 'Privacy contact information': 0.9506172839506173}: 
05/26/2020 14:46:01 - INFO - root -   lr after epoch 1: 3.3e-05
05/26/2020 14:46:01 - INFO - root -   train_loss after epoch 1: 0.6286261975765228
05/26/2020 14:46:01 - INFO - root -   

/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:231: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
05/26/2020 14:49:10 - INFO - root -   Running evaluation
05/26/2020 14:49:10 - INFO - root -     Num examples = 648
05/26/2020 14:49:10 - INFO - root -     Batch size = 16
 100.00% [41/41 00:12<00:00]
05/26/2020 14:49:22 - INFO - root -   eval_loss after epoch 2: 0.3756084274954912: 
05/26/2020 14:49:22 - INFO - root -   eval_F1_macro after epoch 2: 0.0: 
05/26/2020 14:49:22 - INFO - root -   eval_F1_micro after epoch 2: 0.0: 
05/26/2020 14:49:22 - INFO - root -   eval_roc_auc_score_macro after epoch 2: 0.6523186442866101: 
05/26/2020 14:49:22 - INFO - root -   eval_roc_auc_score_micro after epoch 2: 0.7661179698216735: 
05/26/2020 14:49:22 - INFO - root -   eval_accuracy_by_label after epoch 2: {'First Party Collection/Use': 0.7283950617283951, 'Third Party Sharing/Collection': 0.7561728395061729, 'User Access, Edit and Deletion': 0.9629629629629629, 'Data Retention': 0.9783950617283951, 'Data Security': 0.9521604938271605, 'International and Specific Audiences': 0.9367283950617284, 'Do Not Track': 0.9907407407407407, 'Policy Change': 0.9614197530864198, 'User Choice/Control': 0.9259259259259259, 'Introductory/Generic': 0.8888888888888888, 'Practice not covered': 0.9675925925925926, 'Privacy contact information': 0.9506172839506173}: 
05/26/2020 14:49:22 - INFO - root -   lr after epoch 2: 4.857193613652711e-05
05/26/2020 14:49:22 - INFO - root -   train_loss after epoch 2: 0.46162147494879635
05/26/2020 14:49:22 - INFO - root -

Any thoughts?

Elzic6 commented 3 years ago

@Pawel-Kranzberg I do multi labels classification with camembert from fastbert (following Tuto here ).

1 - perf for each label The tuto ends with a global perf score, but how to be sure that the fine tuned model is good for predicting each label? So, I'd like to obtain performance per label. How to obtain confusion matrix, ROC curve and all perf indicators per label ?

2 - code using GPU in Google colab I wonder how to adapt the code you shared when using gpu (In Google colab) instead of cpu? Thanks for helping

Pawel-Kranzberg commented 3 years ago

@Elzic6 Ad 1 - See my approach at #17. You could:

Define the relevant metrics before initializing the metrics list - their names end with "_by_label" in the code above.
Create the metrics list and include it as the metrics argument when initialising the learner object (just like fbeta and roc_auc in the CamemBERT tutorial).

Ad 2 - Is there a problem with it?

Elzic6 commented 3 years ago

@Pawel-Kranzberg thank you for your answer. I've tried to understand what to pick-up in #17 and #19 and then adapt to my case. What I've done :

import libraries

import torch from fast_bert.data_cls import BertDataBunch from fast_bert.learner_cls import BertLearner

Language model Databunch

from fast_bert.data_lm import BertLMDataBunch

Language model learner

from fast_bert.learner_lm import BertLMLearner from fast_bert.metrics import roc_auc, accuracy_thresh, fbeta # accuracy_multilabel, from sklearn.metrics import hamming_loss, accuracy_score, roc_curve, auc, roc_auc_score, f1_score, multilabel_confusion_matrix from torch import Tensor threshold = 0.3

from fast_bert.prediction import BertClassificationPredictor from pathlib import Path from box import Box import pandas as pd import logging logger = logging.getLogger() device_cuda = torch.device("cuda")

Define parameters

args = Box({ "val_set_frac" : 0.2, "seed": 42, # for identically randomly constitute val_set "task_name": 'CP_reviews', "model_name": 'camembert-base', "model_type": 'camembert-base', "train_batch_size": 16, "train_learning_rate": 1e-4, "classification_learning_rate": 9e-5, "num_train_epochs": 1, "num_class_epochs": 1, "fp16": True, "fp16_opt_level": "O2", "warmup_steps": 1000, "logging_steps": 50, "max_seq_length": 512, "multi_gpu": True if torch.cuda.device_count() > 1 else False

(...)

regarding labels

labels = df.columns[2:].to_list() with open("/content/drive/My Drive/Colab Notebooks/dpf_multi_labels/labels/labels.txt", 'w') as f: for i in labels: f.write(i + "\n")

print(labels) ['EFFICACITE', 'FIABILITE', 'ERGO_UX', 'INSTAL_PARAM', 'ESSAI_TEST', 'MAJ_MAINTENANCE', 'DESINSTALL']

labels_list = labels

(fine tuning the model : LMDataBunch, LMLearner, BertLMLearner.fit method, Evaluate, Save model...)

then classification task

databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer=args.model_name, train_file='train_set.csv', val_file='val_set.csv', label_file='labels.txt', text_col='VERBATIM', label_col=['EFFICACITE', 'FIABILITE', 'ERGO_UX', 'INSTAL_PARAM', 'ESSAI_TEST', 'MAJ_MAINTENANCE', 'DESINSTALL'], batch_size_per_gpu=args.train_batch_size, max_seq_length=args.max_seq_length, multi_gpu=args.multi_gpu, multi_label=True, model_type=args.model_name)

here I’ve added ### all Metrics functions you shared above

def Hamming_loss [•••] def confusion_matrix_by_label

then I specified args (picked up in #17) : renamed "args2 as a previous dict was defined above

args2 = { 'metrics': { 'functions': { 'Hamming_loss' : Hamming_loss, 'Exact_Match_Ratio' : Exact_Match_Ratio, 'roc_auc_score_macro' : roc_auc_score_macro, 'roc_auc_score_micro' : roc_auc_score_micro, 'roc_auc_score_by_label' : roc_auc_score_by_label, 'ROC_AUC_by_label' : ROC_AUC_by_label, 'F1' : F1, 'F1_micro' : F1_micro, 'F1_by_label' : F1_by_label, 'accuracy_by_label' : accuracy_by_label, 'confusion_matrix_by_label' : confusion_matrix_by_label, 'FastBert roc_auc': roc_auc, 'FastBert accuracy': accuracy, 'FastBert fbeta': fbeta, } } }

Metrics setup:

metrics = [] for k, v in args2['metrics']['functions'].items(): metrics.append({'name': k, 'function': v})

then define paths to the fine-tuned model

note : I’ve removed the “metrics”, as I understand metrics have been setup at previous step : am I wrong doing it?

metrics = [{'name': 'fbeta', 'function': fbeta}, {'name': 'roc_auc', 'function': roc_auc}, {'name' : 'accuracy_by_label', 'function' : accuracy_by_label}]

OUTPUT_DIR = Path('/content/drive/My Drive/Colab Notebooks/dpf_multi_labels/model_camembert-base/finetuned_model/') WGTS_PATH = Path('/content/drive/My Drive/Colab Notebooks/dpf_multi_labels/model_camembert-base/model_out/pytorch_model.bin') PRET_PATH = Path('/content/drive/My Drive/Colab Notebooks/dpf_multi_labels/model_camembert-base/model_out/')

then learner

cl_learner = BertLearner.from_pretrained_model( databunch, pretrained_path=PRET_PATH, metrics=metrics, device=device_cuda, logger=logger, output_dir=OUTPUT_DIR, finetuned_wgts_path=WGTS_PATH, warmup_steps=args.warmup_steps, multi_gpu=args.multi_gpu, multi_label=True, is_fp16=args.fp16, logging_steps=args.logging_steps)

then train classification model

cl_learner.fit(epochs=args.num_class_epochs, lr=args.classification_learning_rate, validate=True, schedule_type="warmup_cosine", optimizer_type="adamw")

(launched just with one epoch to check the process rapidly), and below, the message error :

0.00% [0/1 00:00<00:00] 32.24% [49/152 00:45<01:34] 100.00% [19/19 00:10<00:00]

TypeError Traceback (most recent call last)

in () 3 validate=True, 4 schedule_type="warmup_cosine", ----> 5 optimizer_type="adamw") 1 frames /usr/local/lib/python3.7/dist-packages/fast_bert/learner_cls.py in validate(self, quiet, loss_only, return_preds) 542 for metric in self.metrics: 543 validation_scores[metric["name"]] = metric["function"]( --> 544 all_logits, all_labels, labels=self.data.labels 545 ) 546 results.update(validation_scores) TypeError: Exact_Match_Ratio() got an unexpected keyword argument 'labels' ... or this one : TypeError: Hamming_loss() got an unexpected keyword argument 'labels' then also tried to indicate directly in the previous cell the metrics list : metrics = [{'name': 'fbeta', 'function': fbeta}, {'name': 'roc_auc', 'function': roc_auc}, {'name': 'Hamming_loss', 'function': Hamming_loss}, {'name': 'Exact_Match_Ratio', 'function': Exact_Match_Ratio}, {'name': 'roc_auc_score_macro', 'function': roc_auc_score_macro}, {'name': 'roc_auc_score_micro', 'function': roc_auc_score_micro}, {'name': 'roc_auc_score_by_label', 'function': roc_auc_score_by_label}, {'name': 'ROC_AUC_by_label', 'function': ROC_AUC_by_label}, {'name': 'F1', 'function': F1}, {'name': 'F1_macro', 'function': F1_macro}, {'name': 'F1_micro', 'function': F1_micro}, {'name': 'F1_by_label', 'function': F1_by_label}, {'name': 'accuracy_by_label', 'function': accuracy_by_label}, {'name': 'confusion_matrix_by_label', 'function': confusion_matrix_by_label}] OUTPUT_DIR = FINETUNED_PATH WGTS_PATH = Path('/content/model/model_out/pytorch_model.bin') PRET_PATH = Path('/content/model/model_out/') But I had the same issue. There must be something wrong with the labels... but I really don't understand what Would you mind telling me what I’ve done wrong? PS : I've seen old reports with the same issue, [pytorch BERT case](https://stackoverflow.com/questions/58454157/pytorch-bert-typeerror-forward-got-an-unexpected-keyword-argument-labels), also [here](https://stackoverflow.com/questions/30014121/got-an-unexpected-keyword-argument-label-while-drawing-box-and-whisker-plot-in) and [here](https://stackoverflow.com/questions/31282110/typeerror-boxplot-got-an-unexpected-keyword-argument-labels) Thank you in advance,

Pawel-Kranzberg commented 3 years ago

@Elzic6 - I've added **kwargs to function parameters above, it should help with

TypeError: Exact_Match_Ratio() got an unexpected keyword argument 'labels' TypeError: Hamming_loss() got an unexpected keyword argument 'labels'

Pawel-Kranzberg commented 3 years ago

However I keep getting f1 micro and macro scores both at 0 starting from epoch N°2.

05/26/2020 14:45:49 - INFO - root -   Running evaluation
05/26/2020 14:45:49 - INFO - root -     Num examples = 648
05/26/2020 14:45:49 - INFO - root -     Batch size = 16
 100.00% [41/41 00:12<00:00]
05/26/2020 14:46:01 - INFO - root -   eval_loss after epoch 1: 0.5465127723972972: 
05/26/2020 14:46:01 - INFO - root -   eval_F1_macro after epoch 1: 0.07030604952823473: 
05/26/2020 14:46:01 - INFO - root -   eval_F1_micro after epoch 1: 0.2791666666666666: 
05/26/2020 14:46:01 - INFO - root -   eval_roc_auc_score_macro after epoch 1: 0.5189993932963294: 
05/26/2020 14:46:01 - INFO - root -   eval_roc_auc_score_micro after epoch 1: 0.7176347017846504: 
05/26/2020 14:46:01 - INFO - root -   eval_accuracy_by_label after epoch 1: {'First Party Collection/Use': 0.6929012345679012, 'Third Party Sharing/Collection': 0.375, 'User Access, Edit and Deletion': 0.9629629629629629, 'Data Retention': 0.9783950617283951, 'Data Security': 0.9521604938271605, 'International and Specific Audiences': 0.8240740740740741, 'Do Not Track': 0.9907407407407407, 'Policy Change': 0.9614197530864198, 'User Choice/Control': 0.9259259259259259, 'Introductory/Generic': 0.8827160493827161, 'Practice not covered': 0.9058641975308642, 'Privacy contact information': 0.9506172839506173}: 
05/26/2020 14:46:01 - INFO - root -   lr after epoch 1: 3.3e-05
05/26/2020 14:46:01 - INFO - root -   train_loss after epoch 1: 0.6286261975765228
05/26/2020 14:46:01 - INFO - root -   

/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:231: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
05/26/2020 14:49:10 - INFO - root -   Running evaluation
05/26/2020 14:49:10 - INFO - root -     Num examples = 648
05/26/2020 14:49:10 - INFO - root -     Batch size = 16
 100.00% [41/41 00:12<00:00]
05/26/2020 14:49:22 - INFO - root -   eval_loss after epoch 2: 0.3756084274954912: 
05/26/2020 14:49:22 - INFO - root -   eval_F1_macro after epoch 2: 0.0: 
05/26/2020 14:49:22 - INFO - root -   eval_F1_micro after epoch 2: 0.0: 
05/26/2020 14:49:22 - INFO - root -   eval_roc_auc_score_macro after epoch 2: 0.6523186442866101: 
05/26/2020 14:49:22 - INFO - root -   eval_roc_auc_score_micro after epoch 2: 0.7661179698216735: 
05/26/2020 14:49:22 - INFO - root -   eval_accuracy_by_label after epoch 2: {'First Party Collection/Use': 0.7283950617283951, 'Third Party Sharing/Collection': 0.7561728395061729, 'User Access, Edit and Deletion': 0.9629629629629629, 'Data Retention': 0.9783950617283951, 'Data Security': 0.9521604938271605, 'International and Specific Audiences': 0.9367283950617284, 'Do Not Track': 0.9907407407407407, 'Policy Change': 0.9614197530864198, 'User Choice/Control': 0.9259259259259259, 'Introductory/Generic': 0.8888888888888888, 'Practice not covered': 0.9675925925925926, 'Privacy contact information': 0.9506172839506173}: 
05/26/2020 14:49:22 - INFO - root -   lr after epoch 2: 4.857193613652711e-05
05/26/2020 14:49:22 - INFO - root -   train_loss after epoch 2: 0.46162147494879635
05/26/2020 14:49:22 - INFO - root -

Any thoughts?

@MajdMustapha - f1 micro and f1 macro both depend on the threshold - it might have been too high in your case.

Elzic6 commented 3 years ago

Hi Pawel, Thanks a lot for your help.

I've tried, with colab pro, and I've the following issue:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

[image: image.png]

Best regards,

Le lun. 24 mai 2021 à 10:17, Pawel Kranzberg @.***> a écrit :

@Elzic6 https://github.com/Elzic6 - I've added **kwargs to function parameters above, it should help with

TypeError: Exact_Match_Ratio() got an unexpected keyword argument 'labels' TypeError: Hamming_loss() got an unexpected keyword argument 'labels'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/utterworks/fast-bert/issues/19#issuecomment-846867490, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATCEYAUQ472CH4TKTC3R7PTTPIDRBANCNFSM4HQJ324Q .

makinno commented 2 years ago

@Pawel-Kranzberg How to compute theses metrics for predictions? in the case of multi-label classification

utterworks / fast-bert

Classification Metrics usage #19

import libraries

Language model Databunch

Language model learner

Define parameters

(...)

regarding labels

(fine tuning the model : LMDataBunch, LMLearner, BertLMLearner.fit method, Evaluate, Save model...)

then classification task

here I’ve added ### all Metrics functions you shared above

then I specified args (picked up in #17) : renamed "args2 as a previous dict was defined above

Metrics setup:

then define paths to the fine-tuned model

metrics = [{'name': 'fbeta', 'function': fbeta}, {'name': 'roc_auc', 'function': roc_auc}, {'name' : 'accuracy_by_label', 'function' : accuracy_by_label}]

then learner

then train classification model

0.00% [0/1 00:00<00:00] 32.24% [49/152 00:45<01:34] 100.00% [19/19 00:10<00:00]