pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

Reference matching performance is poorer than expected #142

Open axfelix opened 6 years ago

axfelix commented 6 years ago

I think there may have been a regression to our performance in matching inline references to the bibliography -- also noticing that we're sometimes getting YYYY-Mon style data returned under the <year> element returned from CrossRef, which we may need to strip as it could be a contributing factor.

axfelix commented 6 years ago

Actually, I think part of this is that Texture is dying on line length when we paste the ref-list back into the document as a single string. We probably just need to run the equivalent of xmllint --format on that when we're doing it...

axfelix commented 6 years ago

@kaschioudi, would it make sense to just run an xmllint --format command at some point in the document_final (stage 20) module? This stack always makes CLI calls and updating files directly more confusing than I'd like...

I added a stub in https://github.com/pkp/ots/commit/b18bc544ef7c0353b0f2aa9ae69b504ae1d98cb7, does it look right? I'm not actually making the call yet...

kaschioudi commented 6 years ago

@axfelix : I did not run the code but it looks good to me. However I believe addRetrieve line is not required. https://github.com/pkp/ots/blob/b18bc544ef7c0353b0f2aa9ae69b504ae1d98cb7/module/XmlFinal/src/XmlFinal/Model/Converter/XmlFinal.php#L79

axfelix commented 6 years ago

Really? xmllint normally just outputs to stdout, so I think I need that if I'm running the shell command with the intent of writing directly to a filepath on the disk, but I guess I could just capture stdout instead, I'm just not sure how best to write it over our existing xmlfinal doc at that point.

kaschioudi commented 6 years ago

I may be wrong but here's my understanding of the inner working.

OTS Command class calls PHP's exec function (http://php.net/manual/en/function.exec.php) behind the scene. https://github.com/pkp/ots/blob/ea5377520bf60425c21a1ff3feb0cc17231176af/vendor/xmlps/library/Command/Command.php#L29 and the command output is stored in $this->output which can be obtained by calling Command:: getOutputString()

However sometimes the error output (STDERR) also needs to be redirected to the output error (STDOUT) and I believe it's done like this $command->addRedirect('2>&1 >/dev/null');

axfelix commented 6 years ago

Yeah, I know how to redirect stderr to stdout, I'm mostly not clear on how best to actually write the output (however we're capturing it) over the preexisting XML :)

If I use the stdout redirect, we don't have any $this-output. If we use this function with $this->output, then I don't need to pass it an output filename, I just need to return the output, and write it over what we have. I'll play with it...