Open ahalterman opened 5 years ago
This should be documented better, since it is from the Spanish cases and cannot be repo'd in this branch.
On Mon, Oct 29, 2018, 15:49 Andy Halterman notifications@github.com wrote:
(Just making an issue for what @philip-schrodt https://github.com/philip-schrodt has reported elsewhere so we can consolidate discussion):
UniversalPetrarch is enormously overproducing events. Not sure what the source of the false positives is. E.g.:
Records evaluated: 1018 Correct events: 565 44.21% Uncoded events: 713 55.79% Extra events: 3212 315.52%
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/52, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1uNZMrRygUC7omvzSEP_197rIBQJks5up2nrgaJpZM4YAUtT .
Any update on this? This is one of the biggest obstacles to using UniversalPetrarch in production.
In progress. Should be getting an update this next week from what's been told to me by the team last week.
On Mon, Nov 5, 2018, 13:25 Andy Halterman <notifications@github.com wrote:
Any update on this? This is one of the biggest obstacles to using UniversalPetrarch in production.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/52#issuecomment-436002949, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1up5mZqA0ImEOVbvrxnXpIPiI9LJks5usJCUgaJpZM4YAUtT .
From talking with @JingL1014, it sounds like she has a couple theories for what could be causing the high false positives:
There's also the possibility there's some larger unknown problem in how it's handling the coding.
I think a few tests could help figure out which are going on:
Is there an update on this? Have you looked into those tests? It sounds from @philip-schrodt like some sentences are still producing many events so something's still going wrong.
This is specific to Spanish and some issues in the validation code for it. Should be getting some more specifics with updates from @JingLu, and Phil and Javier et al on the ES cases.
On Tue, Dec 4, 2018, 08:27 Andy Halterman <notifications@github.com wrote:
Is there an update on this? Have you looked into those tests? It sounds from @philip-schrodt https://github.com/philip-schrodt like some sentences are still producing many events so something's still going wrong.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/52#issuecomment-444118178, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1tkKraOUbxHMbg5Hkh_bcJG5gej4ks5u1oZWgaJpZM4YAUtT .
I added this in validation.py
to
return_dict = petrarch_ud.do_coding(dict)
PETRwriter.write_events(return_dict, "evts.validation.txt")`
where write_events()
is the routine that petrarch_ud
uses to write the final output file, and for AFP_SPA_19940921.0205_7.0_0 is still produces 51 events in the regular output: I really need to be able to look at the code that Arizona is running which is producing fewer events, since all I'm doing is making calls to the petrarch_ud
code I've downloaded from the -master branch: downloaded a new copy this morning and confirmed it is still generating this behavior.
(Just making an issue for what @philip-schrodt has reported elsewhere so we can consolidate discussion):
UniversalPetrarch is enormously overproducing events. Not sure what the source of the false positives is. E.g.: