Closed kaplun closed 7 years ago
You closed the previous issue about this (https://github.com/inspirehep/inspire-next/issues/2115) in https://github.com/inspirehep/inspire-next/pull/2142, which clearly is not a fix for that issue.
Mmh. I remember with @david-caro, we clearly found that #2142 was the cause for #2115. Looks like it wasn't clear after all.
Let me re-check the tests we wrote in #2142.
Articles that have "relevance_prediction": "decision": "Rejected" and no "classifier_results": "complete_output": "Core keywords" should be rejected automatically.
E.g. https://labs.inspirehep.net/holdingpen/622799 has proposal rejected and no core keywords
remark: sometimes BibClassify (aka classifier) fails. In that case the record should not be automatically rejected.
https://labs.inspirehep.net/holdingpen/637506 has no classifier_results at all. Can this mean the classifier failed? Since keywords are not shown in the brief listing it's hard to find good examples.
Status report: Harvested on 2017-05-16 https://labs.inspirehep.net/holdingpen/637421 should be automatically rejected.
As long as this auto-reject doesn't work the holdinpen is not usable for arXiv processing. The display/sorting (issue #2326) would be extremely helpful.
remark: sometimes BibClassify (aka classifier) fails. In that case the record should not be automatically rejected.
https://labs.inspirehep.net/holdingpen/637506 has no classifier_results at all. Can this mean the classifier failed?
This is the problem I mentioned in https://github.com/inspirehep/inspire-next/pull/2313#issuecomment-300827544. The answer is: we don't know, but we must know.
https://labs.inspirehep.net/holdingpen/637421 should be automatically rejected.
As long as this auto-reject doesn't work the holdinpen is not usable for arXiv processing.
and
The display/sorting (issue #2326) would be extremely helpful.
are instead blocked by #2337, and we're working on its solution.
E.g. https://labs.inspirehep.net/holdingpen/622799 has proposal rejected and no core keywords
That one has no classifier_results
at all, and the ones that have no classifier results are should not be autorejected right?
If this is because the program failed we need to notice this and deal with the record manually, i.e. don't autoreject. If possible make it somehow visible that there was a problem. Trivial solution: add 'classifier failed' as core keyword.
If classifier ran fine and found no keywords the record can be autorejected (if "relevance_prediction": "decision": "Rejected").
Btw. author keywords extracted from classifier are nice to have and we want them in the end. But they are not crucial. I.e. if they create problems don't add the author keyword to the metadata for now. Sorting and display have a higher priority than author keywords.
The major problem (#2413) has long been gone, the smaller problem (#2414) is now gone, https://github.com/inveniosoftware-contrib/invenio-classifier/issues/31 is rare and quite hard to fix anyway, so I think that this issue can be closed.
https://labs.inspirehep.net/holdingpen/629517 should not be waiting for action, rather already rejected.
cc: @ksachs