Closed jacquerie closed 6 years ago
CC: @kaplun
From @kaplun on June 7, 2017 11:44
I'll look at it ASAP.
Uh, wait a second, the paper actually contains 4 instances of "gravity waves", so invenio-classifier
might not be completely wrong here. But shouldn't it output keywords that are present verbatim in the paper? Or does it try to be smart?
There's still a problem if it tries to be smart, because it looks like it's mixing https://en.wikipedia.org/wiki/Gravity_wave and https://en.wikipedia.org/wiki/Gravitational_wave.
From @kaplun on June 7, 2017 14:34
It does try to be smart. I think it does some fuzzyfication. The inner spaghetti code is quite large indeed.
Well, then this issue should probably be moved to invenio-classifier
. I was thinking that something more sinister was at play here, like classify_paper
being called on the wrong PDF or something like that.
BTW I'd say that the real problem for #2309 is #2413, not this issue or #2414.
BibClassify has to be smart. Physicist are not nice to us and don't use our standard keywords. Both communities use the phrase 'gravity wave'. It's an acronym in the taxonomy and SHOULD be translated to 'gravitational radiation'. It's not a bug but a feature we can not avoid. Do not change this behavior. Sorry I didn't chime in earlier but I was not aware of this issue.
It's not a bug but a feature we can not avoid. Do not change this behavior.
Ok! Then we can close this.
From @jacquerie on June 7, 2017 10:10
For example: https://labs.inspirehep.net/holdingpen/648620 has the keyword
gravitational radiation
, which doesn't appear anywhere in the PDF, since this is a fluid dynamics paper. They both relate in some way to the wave equation, but that's it...We might want to fix this before declaring https://github.com/inspirehep/inspire-next/issues/2309 as fixed.
Copied from original issue: inspirehep/inspire-next#2415