inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Reindexing with changes in the structure of the document is not possible #2444

Open jmartinm opened 7 years ago

jmartinm commented 7 years ago

The issue might be better tackled in https://github.com/inspirehep/es-cli

If e.g. the current mapping contains:

"persistent_identifiers": {
    "type": "string"
}

And the new mapping contains:

"persistent_identifiers": {
    "properties": {
        "schema": {
             "type": "string"
        },
        "value": {
            "type": "string"
        }
     }
},

Using the ES reindex API will throw an error:

{u'reason': u'object mapping for [references.reference.persistent_identifiers] tried to parse field [null] as object, but found a concrete value', u'type': u'mapper_parsing_exception'}, u'_index': u'remapping_tmp_records-hep'}}])

Expected Behavior

A transformation should happen on reindex time so that the destination field gets properly populated.

One option could be to use the script functionality - see https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-reindex.html

Current Behavior

An error occurs, so we don't have a mechanism for reindexing. Records need to be recreated from scratch.

kaplun commented 7 years ago

Mmh. This underlines the fact that the underlying schema has changed in incompatible way. We should have anytime records expressed with the latest version of the schema, but indeed we don't have yet a way to upgrade existing records. If we upgrade records in DB first, then you can reindex them in ES (rather than upgrade them in ES and then have to upgrade them in DB as well). We shall sprint on it. https://docs.google.com/document/d/13zcbSNdCvaeHLdKWeovdHkE54rQR6pjieLqEBnMCYk0/edit