Closed kaplun closed 7 years ago
Or we just disable indexing of this part of the record, since nobody needs this information to be searchable... the same actually goes for all the output of ML algorithms in extra_data
.
Note that this is afforded to us by the slogan "data in the DB, stuff to search on in ES", the thing we were discussing during standup.
I think this could be a good interim solution. However @ksachs mentioned in https://github.com/inspirehep/inspire-next/issues/2328#issuecomment-300735429 that she is searching for keywords in the Holding pen.
@ksachs what is the use case? Why do you actually need to search using keywords in the holding pen at all?
I just find it weird to have metadata stored in a way that is not (really) searchable. I have no idea yet how we will use the new HP and what we will want to search for.
Ah, we didn't relay this back to you, but searching for keywords has already improved. Now you can do https://labs.inspirehep.net/holdingpen/list/?page=1&size=10&q=_extra_data.classifier_results.complete_output.single_keywords.keyword:%22unified%20field%20theory%22 instead of what you said you were doing in the linked issue.
Current Behavior
invenio-classifier
is outputting many information in a pythonic way. Unfortunately the structure used is:Unfortunately this is using the keyword as key in a dictionary, which is then passed as such to ES, that tries to create a guessed mapping for each of them.
Expected Behavior
Keywords should be passed as list of tuples possibly sorted by their importance.
Note: this was partially addressed already in: https://github.com/inveniosoftware-contrib/invenio-classifier/pull/25