Integrate CERMINE - Githubissues

pkp / ots

PKP XML Parsing Service

GNU General Public License v3.0

32 stars 19 forks source link

Integrate CERMINE #15

Closed crism closed 9 years ago

crism commented 9 years ago

Integrate https://github.com/CeON/CERMINE as a module.

axfelix commented 9 years ago

Got it. Let me have a look at your rewrite a little later on and I'll see if anything seems obviously wrong to me ... glad we're so close, though!

crism commented 9 years ago

Based on a small sample so far, documents that go through the reference extraction pathway hang at merge. Documents that fail reference extraction do not.

axfelix commented 9 years ago

Huh! OK, wild guess -- when is ParsCit firing? immediately after meTypeset? do we want to try punting that to after merge?

jalperin commented 9 years ago

if we needed to, could we force a fail on the reference extraction to get all the documents through and see the front matter parse results? Not saying we should do that yet, obviously this warrants some exploration of what might turn out to be an easy fix.

crism commented 9 years ago

The queues are now completely linear. ParsCit looks at the NLM XML, and makes its own output. Its success or failure ($job->referenceParsingSuccess) is used as a flag in the queue manager to make path decisions later on. I am pretty sure the problem comes when MergeXMLOutputs tries to find the appropriate NLM XML output—as modified by BibtexreferencesConversion or not—but I would have expected an Exception and failure, if I’d gotten that wrong. I’ll look at this more after dinner.

crism commented 9 years ago

The results, @jalperin, are fine when it succeeds; combining two XML documents in this way is extremely straightforward.

crism commented 9 years ago

Oh. If it’d been a snake, it’d bit me.

        $meTypesetDocument = $job->getStageDocument(JOB_CONVERSION_STAGE_NLMXML);

That should have a conditional for the reference extraction success; that document’s stage is changed when the references are updated, and is no longer accessible by that handle.

crism commented 9 years ago

This seems to be fixed.

axfelix commented 9 years ago

Fantastic. I've been having pretty bad insomnia this week (like right now) but very much looking forward to testing in the morning.