icgc-argo / workflow-relay

Collecting information from workflows to report
GNU Affero General Public License v3.0
1 stars 0 forks source link

🐛 Dynamic portions of ES Mapping should be Keyword #120

Closed andricDu closed 3 years ago

andricDu commented 3 years ago

Describe the bug

User and system provided workflow parameters, that are part of the parameters json object are dynamically mapped and indexed in Elasticsearch thanks to dynamic set to true: https://github.com/icgc-argo/workflow-relay/blob/develop/src/main/resources/run_log_mapping.json#L3

While this is desirable to index workflow parameters, there is an edge case when Elasticsearch does automated date and numeric detection. As such this can indexing to fail when parameter FOO is a date in one workflow but a text comment in another.

Steps To Reproduce

Use a param that has value 2021-07-15 and then another workflow with the same param but value 2021-07-15a Elasticsearch will complain:

2021-07-15 13:20:19.647 ERROR 1 --- [container-0-C-1] o.i.workflow.relay.service.IndexService  : Out of order, already have newer version for run wes-09827a9ca80f4e60ad44ce635ffbaba0, exception: Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [parameters._batch_id] of type [date] in document with id 'wes-09827a9ca80f4e60ad44ce635ffbaba0'. Preview of field's value: '2021-07-09a']

Expected behaviour

If we force everything into keyword so that it is all normalised, we should not run into any indexing issues.

andricDu commented 3 years ago

Finished by @jaserud : #122

andricDu commented 3 years ago

released to prod