Index Mapping Updates Through OSIS Pipeline Configuration YAML

bircpark commented 1 month ago

Is your feature request related to a problem? Please describe. Currently an OSIS pipeline seems to require either manual intervention or downtime to be taken when updating the mappings for an index, this includes adding subfields to an already existing field or a brand new field entirely.

Existing Configuration For Mapping

"field_name": {
    "type": "text",
    "fields": {
       "keyword": {
          "type": "keyword"
       }
    }
}

New Configuration For Mapping

"field_name": {
    "type": "text",
    "fields": {
        "english": {
              "type": "text",
              "analyzer": "english"
            },
         "keyword": {
            "type": "keyword"
       }
    }
}

The english subfield is not shown in the cluster and requires downtime or manual changes to be used.

Describe the solution you'd like It would be nice for OSIS pipelines to have the ability to update index mappings when they are updated in configuration. Once the updates are made having something like an update_by_query call or something similar to populate the new fields.

Describe alternatives you've considered (Optional) a) A manual change to the mapping with an invocation of the update_by_query API to backfill records b) Take some downtime to stop the pipeline, delete the index, then restart the pipeline to re-sync data

Additional context The solution suggested is mainly concerned with updating subfields as update_by_query will only populate subfields of already existing fields and won't work for brand new fields being introduced to the mapping. For entirely new fields to the mapping you would need to run something else run (maybe like a Glue Job) to have the documents update reliably.

dlvenable commented 1 month ago

This partially depends on #973. But, it would also need an ability to update the OpenSearch index.

dlvenable commented 1 month ago

@bircpark , What source are you using in this case?

bircpark commented 1 month ago

@dlvenable, My source is Dynamo DB using the Zero-ETL pipeline integration.

dlvenable commented 1 month ago

Based on my understanding, the ask here is for Data Prepper to make a call to PUT <index>/_mapping to update the actual mappings file based on the user-defined input. This will allow modifications to an existing index as new fields are added.

bircpark commented 1 month ago

Yes that is correct.

opensearch-project / data-prepper

Index Mapping Updates Through OSIS Pipeline Configuration YAML #5038