inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Records are still not automatically rejected #2309

Closed kaplun closed 7 years ago

kaplun commented 7 years ago

https://labs.inspirehep.net/holdingpen/629517 should not be waiting for action, rather already rejected.

cc: @ksachs

jacquerie commented 7 years ago

You closed the previous issue about this (https://github.com/inspirehep/inspire-next/issues/2115) in https://github.com/inspirehep/inspire-next/pull/2142, which clearly is not a fix for that issue.

kaplun commented 7 years ago

Mmh. I remember with @david-caro, we clearly found that #2142 was the cause for #2115. Looks like it wasn't clear after all.

Let me re-check the tests we wrote in #2142.

ksachs commented 7 years ago

Articles that have "relevance_prediction": "decision": "Rejected" and no "classifier_results": "complete_output": "Core keywords" should be rejected automatically.

E.g. https://labs.inspirehep.net/holdingpen/622799 has proposal rejected and no core keywords

ksachs commented 7 years ago

remark: sometimes BibClassify (aka classifier) fails. In that case the record should not be automatically rejected.

https://labs.inspirehep.net/holdingpen/637506 has no classifier_results at all. Can this mean the classifier failed? Since keywords are not shown in the brief listing it's hard to find good examples.

Status report: Harvested on 2017-05-16 https://labs.inspirehep.net/holdingpen/637421 should be automatically rejected.

As long as this auto-reject doesn't work the holdinpen is not usable for arXiv processing. The display/sorting (issue #2326) would be extremely helpful.

jacquerie commented 7 years ago

remark: sometimes BibClassify (aka classifier) fails. In that case the record should not be automatically rejected.

https://labs.inspirehep.net/holdingpen/637506 has no classifier_results at all. Can this mean the classifier failed?

This is the problem I mentioned in https://github.com/inspirehep/inspire-next/pull/2313#issuecomment-300827544. The answer is: we don't know, but we must know.

jacquerie commented 7 years ago

https://labs.inspirehep.net/holdingpen/637421 should be automatically rejected.

As long as this auto-reject doesn't work the holdinpen is not usable for arXiv processing.

and

The display/sorting (issue #2326) would be extremely helpful.

are instead blocked by #2337, and we're working on its solution.

david-caro commented 7 years ago

E.g. https://labs.inspirehep.net/holdingpen/622799 has proposal rejected and no core keywords

That one has no classifier_results at all, and the ones that have no classifier results are should not be autorejected right?

ksachs commented 7 years ago

If this is because the program failed we need to notice this and deal with the record manually, i.e. don't autoreject. If possible make it somehow visible that there was a problem. Trivial solution: add 'classifier failed' as core keyword.

If classifier ran fine and found no keywords the record can be autorejected (if "relevance_prediction": "decision": "Rejected").

Btw. author keywords extracted from classifier are nice to have and we want them in the end. But they are not crucial. I.e. if they create problems don't add the author keyword to the metadata for now. Sorting and display have a higher priority than author keywords.

jacquerie commented 7 years ago

The major problem (#2413) has long been gone, the smaller problem (#2414) is now gone, https://github.com/inveniosoftware-contrib/invenio-classifier/issues/31 is rare and quite hard to fix anyway, so I think that this issue can be closed.