pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

Switch to Grobid for front matter parsing of Word docs #143

Closed axfelix closed 6 years ago

axfelix commented 6 years ago

Recent tests performed by eLife show that Grobid substantially outperforms Cermine for front-matter parsing on our Word document corpus -- our other decisions seem sound right now, and front matter parsing is less important in the context of OJS integration, but we should make some changes to our Merge module to reflect this.

axfelix commented 6 years ago

Actually, I've tested this, and the performance isn't that much higher on real-world docs as it is on our corpus. I'm going to close for now but may revisit.