njallskarp / finetune-qa-powerset

Finetuning BERT models on a powerset of different linguistic domains
https://lvl.ru.is/
5 stars 0 forks source link

added metrics file #13

Closed njallskarp closed 1 year ago

njallskarp commented 1 year ago

Functions to evaluate a BERT model

In this pull request I have added a single python file that contains a function that first, evaluates a single model based on test data. Second, the necessary helper functions which have been adapted from SQuAD's official scoring script (see citation on top of code file).

How to test

automatic linter checks with Ruff no unit testing

lsig commented 1 year ago

You should move the dictionary type declarations into the types folder, where other types are declared. Otherwise looks good 👍 . Also need to merge validate to dev to get the types folder.

Eysteinn-Orn commented 1 year ago

About the pull request

and

Instructions on how to test