In this pull request I have added a single python file that contains a function that first, evaluates a single model based on test data. Second, the necessary helper functions which have been adapted from SQuAD's official scoring script (see citation on top of code file).
You should move the dictionary type declarations into the types folder, where other types are declared. Otherwise looks good 👍 . Also need to merge validate to dev to get the types folder.
Functions to evaluate a BERT model
In this pull request I have added a single python file that contains a function that first, evaluates a single model based on test data. Second, the necessary helper functions which have been adapted from SQuAD's official scoring script (see citation on top of code file).
How to test
automatic linter checks with Ruff no unit testing