Closed jacquerie closed 7 years ago
CC: @kaplun
I don't understand. https://labs.inspirehep.net/api/holdingpen/653398 has classifier_results
:
"classifier_results": {
"categories": {
"Boltzmann equation": "HEP",
"Grid computing": "HEP",
"S-matrix": "HEP",
"algebra": "HEP",
"color": "HEP",
"computer": "HEP",
"conservation law": "HEP",
"costs": "HEP",
"critical phenomena": "HEP",
"defect": "HEP",
"distribution function": "HEP",
"engineering": "HEP",
"entropy": "HEP",
"flow": "HEP",
"fragmentation": "HEP",
"kinematics": "HEP",
"kinetic": "HEP",
"nonlinear": "HEP",
"nonlocal": "HEP",
"phase space": "HEP",
"scalar particle": "HEP",
"simplex": "HEP",
"site": "HEP",
"solids": "HEP",
"statistical mechanics": "HEP",
"symmetry breaking": "HEP",
"turbulence": "HEP",
"viscosity": "HEP"
},
"complete_output": {
"acronyms": {},
"author_keywords": [],
"composite_keywords": {
"density, scalar": {
"details": [
7,
6
],
"numbers": 1
},
"dimension, 2": {
"details": [
0,
36
],
"numbers": 3
},
"effect, higher-order": {
"details": [
8,
4
],
"numbers": 1
},
"energy, cascade": {
"details": [
3,
3
],
"numbers": 3
},
"fluid, coupling": {
"details": [
6,
2
],
"numbers": 1
},
"fluid, magnetic": {
"details": [
6,
14
],
"numbers": 1
},
"fluid, velocity": {
"details": [
6,
22
],
"numbers": 2
},
"hydrodynamics, magnetic": {
"details": [
0,
14
],
"numbers": 2
},
"lattice, dependence": {
"details": [
20,
2
],
"numbers": 1
},
"magnetic field, axial": {
"details": [
35,
1
],
"numbers": 1
},
"magnetic field, effect": {
"details": [
35,
8
],
"numbers": 1
},
"magnetic field, low": {
"details": [
35,
0
],
"numbers": 1
},
"moment, higher-order": {
"details": [
20,
4
],
"numbers": 1
},
"radiation, effect": {
"details": [
1,
8
],
"numbers": 1
},
"stability, magnetic": {
"details": [
27,
14
],
"numbers": 1
},
"tensor, energy-momentum": {
"details": [
6,
0
],
"numbers": 1
},
"vortex, model": {
"details": [
11,
12
],
"numbers": 1
}
},
"core_keywords": {
"scalar particle": 2
},
"field_codes": {},
"filtered_core_keywords": {},
"single_keywords": {
"Boltzmann equation": 1,
"S-matrix": 3,
"algebra": 5,
"color": 1,
"computer": 1,
"conservation law": 8,
"costs": 3,
"critical phenomena": 1,
"defect": 1,
"distribution function": 15,
"engineering": 1,
"entropy": 16,
"kinematics": 1,
"kinetic": 7,
"nonlinear": 2,
"nonlocal": 2,
"scalar particle": 2,
"statistical mechanics": 3,
"symmetry breaking": 1,
"viscosity": 4
}
},
"fast_mode": false
Or maybe it is just that someone manually fixed this entry meanwhile?
Actually it's the second part of this code that return False
:
score = relevance_prediction.get('max_score')
decision = relevance_prediction.get('decision')
all_class_results = classification_results.get('complete_output')
core_keywords = all_class_results.get('core_keywords')
return (
decision.lower() == 'rejected' and
score > 0 and
len(core_keywords) == 0
)
For this record the max_score
is actually negative. Actually this is not even a rejected record. :-1: We should find a different example...
Uh, probably I made a mistake while copying and pasting. Well, here are a few more examples from the first two pages of halted records:
https://labs.inspirehep.net/api/holdingpen/654418 https://labs.inspirehep.net/api/holdingpen/654413 https://labs.inspirehep.net/api/holdingpen/654282 https://labs.inspirehep.net/api/holdingpen/654276 https://labs.inspirehep.net/api/holdingpen/654275
OK, first in the list was due to downloaded PDF being compressed due to bug solved in #2411. So it all makes sense that these were not extracted. I suspect the same misbehavior should be valid for all PDFs from when I refactored the workflow to centralize download of PDF till when #2411 will be deployed.
Per user request, the logic to automatically reject papers defers the decision to the curator when
invenio-classifier
did not fire: https://github.com/inspirehep/inspire-next/blob/9eed5e3d61ea7e04ff9d7192480d9e0ff1c62a46/inspirehep/modules/workflows/tasks/actions.py#L133-L134The problem is that
invenio-classifier
almost never fires, so too much stuff is put in front of curators. For example, https://labs.inspirehep.net/api/holdingpen/653398 went throughclassify_paper
but noclassifier_results
was added to it.This needs to be fixed before we declare https://github.com/inspirehep/inspire-next/issues/2309 to be fixed.