pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

Citation parsing dies -- CrossRef API output may have changed? #74

Closed axfelix closed 8 years ago

axfelix commented 8 years ago

reproducing from Slack:

any idea what could be causing the following error to be thrown for multiple jobs?

[10:14 AM]
A worker threw an exception in ParsCitConversion\Model\Queue\Job\ParsCitJob: File doesn't exist

[10:14 AM]
it's happening frequently (though not every single time) since I migrated the stack to two new VMs this week:

[10:15 AM]
http://pkp-xml-test.lib.sfu.ca/ and http://pkp-xml-demo.lib.sfu.ca pkp-xml-test.lib.sfu.ca

[10:15 AM]
and I'm trying to do a fresh run of the test framework but hitting this for documents that should be passing

[10:16 AM]
the weird thing is that document.bib appears to be being created successfully in the output directory

[10:17 AM]
so does references/parsCit.txt

[10:25 AM]
axfelix document.bib.xml isn't there though

[10:29 AM]
OK, I turned up the log level and got this...

[10:29 AM]
2016/07/05 10:29:25 ERR A worker threw an exception in ParsCitConversion\Model\Queue\Job\ParsCitJob: File doesn't exist 2016/07/05 10:29:25 DEBUG Citation list doesn't contain any entries 2016/07/05 10:29:24 DEBUG Executing ParsCit command: vendor/knmnyn/ParsCit/bin/citeExtract.pl -m 'extract_citations' 'var/documents/2/10/references/parsCit.txt' 2> /dev/null

[10:30 AM]
that file isn't empty, though, which is weird...

[10:31 AM]
though it does look like it's been over-enthusiastically trimmed in places: 27. Zorowich, JP, Sernik, RA, Tornozelo e Pé: In Sernik RA, Cerri, GG: Ultrassonografia Sistema Músculo-Esquelética. 1 ed. São Paulo: Ed. Sarvier, 1ª. Ed., 178-184, 2002.

  1. ohamed O, Cerny K, Jones W, Burnfield JM. The effect of terrain on foot pressures during walking. Foot Ankle Int. 2005; 26 : 859-69.
  2. trauss M B,. The orthopa

[10:32 AM]
weird that I can't duplicate this bug on the old server. all of the package versions here should be the same...

[10:34 AM]
oh god, I hope it's not a perl version issue :slightly_smiling_face:

[10:36 AM]
document.bib looks good though and calling that parscit command directly has the same issues parsing that txt file, so it's probably more likely there's an issue in the referencesConversion module than the parsCit module. I just can't tell what's causing it to happen on this new server...

[10:39 AM]
I guess if the crossref API output has changed this could be an issue: https://github.com/pkp/xmlps/blob/master/module/ReferencesConversion/src/ReferencesConversion/Model/Converter/References.php#L196