inveniosoftware / invenio-cli

CLI module for Invenio
https://invenio-cli.readthedocs.io
MIT License
10 stars 43 forks source link

cli: prototype the custom field group of commands #302

Closed zzacharo closed 2 years ago

zzacharo commented 2 years ago

The CLI command should be able to do the following

Prototype goal

ppanero commented 2 years ago

Elasticsearch side tests

:warning: we are currently indexing extensions fields. Need to remove those.

Add a field

Adding a field inside metadata. This simulates adding them inside a custom parent field. The field must be specified using dot separation.

PUT localhost:9200/rdmrecords-records/_mapping

{
    "properties": {
        "metadata.new_text": {
            "type": "text"
        }
    }
}

Then we check the index

GET localhost:9200/rdmrecords-records-record-v5.0.0

{
    "metadata": {
        "new_text": {
            "type": "text"
        }
    }
}

However, since rdmrecords-records is an alias it gets added to all inner indices. This could be fixed by adding write_index_only=true to the PUT request. However, we do not define any write index for the alias. Fix one:

a) If we want to allow writing to old indices we need to do this manually and keep track somehow. E.g. programattically checking RDMRecord.Index.name. b) If we do not want that, we can simply define it. Via curl as follows (can also be done programatically, modifying the index init command.


# note: it needs the full index name including the timestamp.
# cannot be done against the v5.0.0 alias.

PUT localhost:9200/rdmrecords-records-record-v5.0.0-1656937828/_alias/rdmrecords-records

{
    "is_write_index": true
}

Add a field property (e.g. add keyword to a text)

Adding multi-fields to an existing field is also possible.

PUT localhost:9200/rdmrecords-records/_mapping

{
    "properties": {
        "metadata.only_last": {
            "type": "text",
            "fields": {
                "raw": {
                    "type": "keyword"
                }
            }
        }
    }
}

Then we check the index

GET localhost:9200/rdmrecords-records-record-v5.0.0

{
    "metadata": {
        "only_last": {
            "type": "text",
            "fields": {
                "raw": {
                    "type": "keyword"
                }
            }
        }
    }
}

Add a property

:warning: NOT SUPPORTED by InvenioRDM

Adding properties to a complex object (e.g. adding department to a creator) is also possible but out of the scope of what we want to achieve.

Edit a type

:warning: NOT SUPPORTED by ES

From the ES docs update mapping API

you can’t change the mapping or field type of an existing field. Changing an existing field could invalidate data that’s already indexed.

On the other hand, mapping parameters (e.g. field length) can be changed.

The same happens for field renaming. The docs suggest the creation of a new index (with the renamed field in the mapping) and triggering a re-index.

ppanero commented 2 years ago

Programatically updating the mapping:

index = RDMRecord.index  # this means we do not have to choose a write index, we update only the latest mapping/alias

properties = { "properties": { "metadata.programatic_too": { "type": "text"}}}
index.put_mapping(body=properties)

Considerations, from the CLI we might have to find a way to link e.g. records to RDMRecord class. Or hardcode the knowledge since the update needs to be done in both records and drafts.

Moreover, for administrative fields it might be interesting to check the meta fields of the mapping:

meta – A mapping type can have custom meta data associated with it. These are not used at all by Elasticsearch, but can be used to store application-specific metadata.

ppanero commented 2 years ago

No validation has been done, it was deemed unnecessary with the current config status.

Closing this issue about prototyping. The CLI is available in rdm-records/custom-fields branch.