njallskarp / finetune-qa-powerset

Finetuning BERT models on a powerset of different linguistic domains
https://lvl.ru.is/
5 stars 0 forks source link

Feature/metrics #12

Closed njallskarp closed 1 year ago

njallskarp commented 1 year ago

Functions to evaluate a BERT model

In this pull request I have added a single python file that contains a function that first, evaluates a single model based on test data. Second, the necessary helper functions which have been adapted from SQuAD's official scoring script (see citation on top of code file).

How to test

1) automatic linter checks with Ruff 2) no unit testing