opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
53 stars 111 forks source link

[BUG] Index Transform creation failure when source indices are missing mappings for grouped fields. #1270

Open smuthukaruppannp opened 2 weeks ago

smuthukaruppannp commented 2 weeks ago

What is the bug? Index Transform creation failure when some source indices are missing mappings for grouped fields.

When transforms are targeted for multiple source indices and if any of them is missing a definition for a group by field, then the transform creation fails with the following error.

Cannot find field [status] that can be grouped as [terms] in [transformindex2].

In situations where mappings are created dynamically, an index will not have mapping for a field if it was not indexed. For example, an error status field that gets generated only occasionally. Index transform does work in this case with a null value for the missing field. User might also be decide to filter out documents that don't have the necessary fields in the data_selection_query. It might make sense to change the mapping validation only as warning rather than an issue.

How can one reproduce the bug? Steps to reproduce the behavior: POST _index_template/transformindex { "index_patterns": [ "transformindex*" ], "template": { "mappings": { "dynamic_templates": [ { "strings": { "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ] } } }

POST _bulk {"index":{"_index":"transformindex1"}} {"category":"shirts","price":10,"status":"1","order_date":"2024-10-01T01:00:00"} {"index":{"_index":"transformindex2"}} {"category":"trousers","price":20,"order_date":"2024-10-01T01:00:00"}

PUT _plugins/_transform/transformtest { "transform": { "schedule": { "interval": { "start_time": 1727924825, "period": 1, "unit": "Minutes" } }, "enabled": true, "description": "transformtest", "source_index": "transformindex*", "data_selection_query": { "match_all": { "boost": 1 } }, "target_index": "transformresults", "page_size": 1000, "groups": [ { "terms": { "source_field": "category", "target_field": "category_terms" } }, { "date_histogram": { "calendar_interval": "1h", "source_field": "order_date", "target_field": "order_date _date_histogram_1_h_calendar", "timezone": "UTC", "format": null } }, { "terms": { "source_field": "status", "target_field": "status_terms" } } ], "aggregations": { "avg_price": { "avg": { "field": "price" } } }, "continuous": true } }

What is the expected behavior? Index transform should be allowed to be created.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

MahendraAkkina commented 1 week ago

+1

bharath-techie commented 1 week ago

[ Triage attendees - 1 2 3 4]

Thanks for raising the issue. Would you like to open a PR for the same ?