lfoppiano / grobid-quantities

GROBID extension for identifying and normalizing physical quantities.
https://grobid-quantities.readthedocs.io
Apache License 2.0
72 stars 24 forks source link

Create holdout set #145

Closed lfoppiano closed 1 year ago

lfoppiano commented 1 year ago

This PR will select some paper to have an holdout set. At the moment, as the data set is small, we will use all the documents for create the final models, however we will keep a fixed holdout set to have a more strict and precise evaluation. Except for Units where the evaluation set was borrowed by a different source.

The holdout set was created using an automatic script and re-balanced based on the distribution of entities between training and holdout set.

The python script to reproduce the holdout dataset are contained under scripts.

The statistics about the training/holdout set can be found in:

coveralls commented 1 year ago

Coverage Status

Coverage remained the same at 27.67% when pulling 06c7e11ab71fbff32a22f0e5ef47957945c4109e on feature/holdout-set into 0957bc631017ca3c603bb394d53af4b9643720d3 on master.

lfoppiano commented 1 year ago

I think this is ready to merge