pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

Need to add complete tests for //back//ref, //body//table-wrap, //body//fig, and //body//xref #62

Closed axfelix closed 8 years ago

axfelix commented 8 years ago

We've been asked to fill out the gaps in the testing stack, which is reasonable. Most of these are handled to some degree or another in the current implementation -- table-wrap and figure have their captions tests, which threoretically provides a lower bound result, and we have vague precision scores for xref -- but we should get proper F scores for these.

For //body//xref, we can probably start comparing the actual attribute value (eg the Lavergne, 1998 from <xref rid="IDef5d9794-a364-4d3c-9d75-7e89adea2ecb" ref-type="bibr" id="IDae042a77-4222-4a51-a6c3-5977ae01851b">Lavergne, 1998</xref>) from the input against the output as with our other tests. It might be low but that's OK, I don't think there are any technical obstacles to us making this comparison at this point.

For //body//fig, we should probably implement something like what we currently have for //body//xref -- counting the total number of figures detected -- and we should technically be able to assign a precision/recall/F-score, rather than just the current "precision" score that's provided for //body//xref, if we just think of it in terms of type 1 vs. type 2 errors (overdetection / underdetection). I don't think there's any reasonable way to test the figure URI or anything like that, since that's fairly arbitrary.

For //body//table-wrap, I'm not sure how far we want to go -- validating every individual <td> element would be a pain (which is part of why we're just using the captions to provide lower-bound results currently) -- I'm open to "sampling" ideas here, we could just use results that aren't dependent on captions.

For //back//ref ... we could try to compare every single element of a reference, but that's going to be a huge amount of effort for really low numbers. We could try to do sampling here -- eg, just match //back//ref//article-title and report it as //back//ref, with an asterisk in the explanation. Open to suggestions.

@jnicolls, @jalperin, thoughts? I can do some of this, but we ideally want it in time for Monday March 7 (or that weekend's cron run) and I'm away Wednesday-Wednesday.

jnicolls commented 8 years ago

That sounds reasonable. I have a ton of schoolwork earlier in the week, but I'll work on this later in the week.

Joseph

Sent from my iPhone

On Feb 22, 2016, at 3:39 PM, axfelix notifications@github.com wrote:

We've been asked to fill out the gaps in the testing stack, which is reasonable. Most of these are handled to some degree or another in the current implementation -- table-wrap and figure have their captions tests, which threoretically provides a lower bound result, and we have vague precision scores for xref -- but we should get proper F scores for these.

For //body//xref, we can probably start comparing the actual attribute value (eg the Lavergne, 1998 from Lavergne, 1998) from the input against the output as with our other tests. It might be low but that's OK, I don't think there are any technical obstacles to us making this comparison at this point.

For //body//fig, we should probably implement something like what we currently have for //body//xref -- counting the total number of figures detected -- and we should technically be able to assign a precision/recall/F-score, rather than just the current "precision" score that's provided for //body//xref, if we just think of it in terms of type 1 vs. type 2 errors (overdetection / underdetection).

For //body//table-wrap, I'm not sure how far we want to go -- validating every individual element would be a pain (which is part of why we're just using the captions to provide lower-bound results currently) -- I'm open to "sampling" ideas here, we could just use results that aren't dependent on captions.

For //back//ref ... we could try to compare every single element of a reference, but that's going to be a huge amount of effort for really low numbers. We could try to do sampling here -- eg, just match //back//ref//article-title and report it as //back//ref, with an asterisk in the explanation. Open to suggestions.

@jnicolls, @jalperin, thoughts? I can do some of this, but we ideally want it in time for Monday March 7 (or that weekend's cron run) and I'm away Wednesday-Wednesday.

— Reply to this email directly or view it on GitHub.

axfelix commented 8 years ago

Update: this is now implemented and can be closed -- thanks, @jnicolls ! All outstanding issues with testing of figures are just in the stack itself, afaik.