terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

Opensearch2 state storage mappings are created with dynamic set to true #3809

Open busma13 opened 4 weeks ago

busma13 commented 4 weeks ago

When indices for the teraslice state storage are created we pass in mappings that include dynamic: false. job mapping:

{
    settings: {
        'index.number_of_shards': 5,
        'index.number_of_replicas': 1
    },
    mappings: {
        job: {
            _all: {
                enabled: false
            },
            dynamic: false,
            properties: {
                active: {
                    type: 'boolean'
                },
                job_id: {
                    type: 'keyword'
                },
                _context: {
                    type: 'keyword'
                },
                _created: {
                    type: 'date'
                },
                _updated: {
                    type: 'date'
                },
                _deleted: {
                    type: 'boolean'
                },
                _deleted_on: {
                    type: 'date'
                }
            }
        }
    }
};

In opensearch2 the resulting indices do not have the dynamic field set to false. Also notice that the job key is replaced by _doc. opensearch1 result:

2024-10-24 07:08:33 [2024-10-24T14:08:33,931][DEBUG][o.o.i.m.MapperService    ] [dd1513408a50] [teracluster__jobs] [[teracluster__jobs/QOaTZIHUTmSC4NLsiFiEJg]] added mapping [job],
source [
  {
    "job": {
      "dynamic": "false",
      "properties": {
        "_context": {
          "type": "keyword"
        },
        "_created": {
          "type": "date"
        },
        "_deleted": {
          "type": "boolean"
        },
        "_deleted_on": {
          "type": "date"
        },
        "_updated": {
          "type": "date"
        },
        "active": {
          "type": "boolean"
        },
        "job_id": {
          "type": "keyword"
        }
      }
    }
  }
]

opensearch2 result:

2024-10-24 07:12:27 [2024-10-24T14:12:27,476][DEBUG][o.o.i.m.MapperService    ] [085e91d382e3] [teracluster__jobs] [[teracluster__jobs/5FDflr_JQUKdS3clnCK6Gw]] added mapping [_doc],
source [
  {
    "_doc": {
      "properties": {
        "_context": {
          "type": "keyword"
        },
        "_created": {
          "type": "date"
        },
        "_deleted": {
          "type": "boolean"
        },
        "_deleted_on": {
          "type": "date"
        },
        "_updated": {
          "type": "date"
        },
        "active": {
          "type": "boolean"
        },
        "job_id": {
          "type": "keyword"
        }
      }
    }
  }
]
busma13 commented 4 weeks ago

I made a test repo that uses "@opensearch-project/opensearch": "2.12.0" js client to make an index in opensearch2 using the same mapping as in teraslice and dynamic: false is added properly.

I am thinking there is possibly a setting on the client we create that is overriding the dynamic setting.

busma13 commented 4 weeks ago

I've finally found the code that modifies the mapping to remove the dynamic property. packages/elasticsearch-store/src/elasticsearch-client/method-helpers/helper-utils.ts ensureNoTypeInMapping() removes the type, but fails to copy over any fields besides properties and _meta.

busma13 commented 4 weeks ago

To modify this function properly we need to know all the possible keys within a mapping that are not a type because we are selectively only copying those keys. So far I have properties, _meta, and dynamic. _all seems to get removed before this function is called, and it is not valid in OS2 or ES8 (the two cases where this function is called), so we wouldn't want to copy it if it was there. Does anyone know of any other possibilities? @godber @lesleydreyer @jsnoble

lesleydreyer commented 4 weeks ago

I can't think of anything other than the ones you found - properties/_meta/dynamic