kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.51k stars 451 forks source link

Improvement to build using CI #307

Open lfoppiano opened 6 years ago

lfoppiano commented 6 years ago

Some ideas gathered with @de-code:

kermitt2 commented 6 years ago

I am not sure that running the PMC 1942 PDF tests with CI is a good idea: heavy process that can take more than one hour on a good machine. It's really something tied to development and tuning. In addition there is no mechanism to store and compare metrics for this end-to-end evaluation over time, so the automation could not be exploited at this time.

de-code commented 6 years ago

Personally I think there is even value to have that in the absence of automatic metric comparison. Once you notice a degradation you could more easily narrow it down later. Of course it would be nice have an automatic comparison like there is for coverage. Perhaps an alternative would be to run over a subset of the PMC 1942 tests or run it delayed rather than on every commit. Well, just an idea anyway.