inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Unexpected chars in keyward suggestion keys break ES #2345

Closed david-caro closed 7 years ago

david-caro commented 7 years ago

So, when debugging one of the records on labs we get a guessed keyword for 'author keywords' that has a '.' in it, and that breaks ES mapping:

632059 -> this is caused by us being unable to properly index the record that has a '.' in one of the author provided keyword keys:

wfl.extra_data['classifier_results']['complete_output']['Author keywords'] {u'Dark matter': [u'dark matter'], u'grand unification': [u'grand unified theory'], u'supersymmetry.': [u'supersymmetry']}

When trying to save it it complains with:

[2017-05-17 17:57:34,037] ERROR in api: TransportError(400, u'mapper_parsing_exception', u"Field name [supersymmetry.] cannot contain '.'") Traceback (most recent call last): File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows_ui/api.py", line 68, in wrapper self_or_cls.indexer.index(result) File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows_ui/indexer.py", line 66, in index body=self._prepare_record(record, index, doc_type), File "/opt/inspire/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/opt/inspire/lib/python2.7/site-packages/elasticsearch/client/init.py", line 279, in index _make_path(index, doc_type, id), params=params, body=body) File "/opt/inspire/lib/python2.7/site-packages/elasticsearch/transport.py", line 327, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/opt/inspire/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 84, in perform_request self._raise_error(response.status_code, raw_data) File "/opt/inspire/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 114, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) RequestError: TransportError(400, u'mapper_parsing_exception', u"Field name [supersymmetry.] cannot contain '.'") [2017-05-17 17:57:34,038] ERROR in api: Problem while indexing workflow object <property object at 0x4786788>

david-caro commented 7 years ago

The issue might be on beard side though

jacquerie commented 7 years ago

This is solved by invenio-classifier==1.1.0, but we cannot yet upgrade to that (and in fact, I've pinned to the previous version) because it breaks the build: https://github.com/inspirehep/inspire-next/issues/2328#issuecomment-302101126.

jacquerie commented 7 years ago

Ah, wait a second, this is in fact a duplicate of #2328!