Closed jacquerie closed 7 years ago
CC: @kaplun
I'll look at it ASAP.
Uh, wait a second, the paper actually contains 4 instances of "gravity waves", so invenio-classifier
might not be completely wrong here. But shouldn't it output keywords that are present verbatim in the paper? Or does it try to be smart?
There's still a problem if it tries to be smart, because it looks like it's mixing https://en.wikipedia.org/wiki/Gravity_wave and https://en.wikipedia.org/wiki/Gravitational_wave.
It does try to be smart. I think it does some fuzzyfication. The inner spaghetti code is quite large indeed.
Well, then this issue should probably be moved to invenio-classifier
. I was thinking that something more sinister was at play here, like classify_paper
being called on the wrong PDF or something like that.
BTW I'd say that the real problem for #2309 is #2413, not this issue or #2414.
This issue was moved to inveniosoftware-contrib/invenio-classifier#31
For example: https://labs.inspirehep.net/holdingpen/648620 has the keyword
gravitational radiation
, which doesn't appear anywhere in the PDF, since this is a fluid dynamics paper. They both relate in some way to the wave equation, but that's it...We might want to fix this before declaring https://github.com/inspirehep/inspire-next/issues/2309 as fixed.