utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.86k stars 341 forks source link

Mapping validation true labels(tensors) to predicted labels ? #201

Closed mohammedayub44 closed 4 years ago

mohammedayub44 commented 4 years ago

Hi,

Is there an easy way from the databunch object or learner object to map the true validation labels to predicted labels. ? For now I'm using val data as test and want to compare to predicted values. I see databuch has this - databunch.val_dl.dataset.tensors[3] something like .labels method for each sentence would be helpful that way I can get prediction list : predictions = [ max(s, key = lambda i : i[1])[0] for s in preds]

and run a confusion matrix for the single /multi classification

Thanks!

aaronbriel commented 4 years ago

You can try something like this, which pulls the index of the maximum prediction: max_preds = predictions.argmax(dim=1, keepdim=True)

To convert true labels to the same format depends on your solution. For example, you can store the index of the class and leverage labels ordering.

aaronbriel commented 4 years ago

One thing to note, if you pass val_file to the BertDataBunch and validate=True, the validation dataset is already run during training. Personally, I prefer to compute the confusion matrix on the test set, this way I can get a post-training feel for the accuracy of the model.

mohammedayub44 commented 4 years ago

@aaronbriel Thanks. Getting max value from predictions was not difficult, not sure argmax works as learner returns list(tuples) objects in predictions. For true labels, since I had only 3 labels iterated over databunch.val_dl.dataset.tensors[3] and mapped them to strings with checks np.array_equal(x, [1,0,0])) , probably not the best solution , works well for now.