pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

Grobid integration #69

Closed axfelix closed 8 years ago

axfelix commented 8 years ago

This has been taking place offline but is nearly done -- Intent is to use it to replace front//abstract parsing currently.

axfelix commented 8 years ago

Done! https://github.com/pkp/xmlps/commit/80a750936728035a038995a215fe7e287fbb4fb6

axfelix commented 8 years ago

Reopening this issue temporarily because grobid's spinup time is throwing off our corpus processing when not run as a service. Going to change the grobid module to run as a service.

axfelix commented 8 years ago

@kaschioudi files like grobid.log.2016-05-30 are being created in xmlps root after running the batch mode implementation on the demo instance for a while. any chance we aren't redirecting some of its log/temp output properly?

axfelix commented 8 years ago

Reopening because I just noticed this -- we should use the Grobid abstract merge for all documents, not just those that came from PDF input:

https://github.com/pkp/xmlps/blob/master/module/MergeXMLOutputs/src/MergeXMLOutputs/Model/Queue/Job/MergeJob.php#L53

axfelix commented 8 years ago

This will be fixed by merging rearrange-merge.