Closed jsnoble closed 11 months ago
Can you link to the test or tests are failing and the test runs (assuming they're in Github Actions) that have failed?
I think this might be an example of the failure:
https://github.com/terascope/teraslice/actions/runs/5600622766/jobs/10243288784#step:9:150
Comparing two active Opensearch state clusters I can see that the schema on the Opensearch 1 has the following for _slicer_stats
"_slicer_stats": {
"type": "object"
},
And the Opensearch 2 cluster has:
"_slicer_stats": {
"properties": {
"failed": {
"type": "long"
},
"job_duration": {
"type": "long"
},
"processed": {
"type": "long"
},
"queued": {
"type": "long"
},
"queuing_complete": {
"type": "date"
},
...
}
},
So it is as if Opensearch 2 infers types in nested objects wheras Opensearch 1 did not. I think you were on to this already Jared.
Wouldn't you still have this issue if you added a mapping? Since the mapping you add would be type
date, and the code is still (presumably) trying to write a queuing_complete = ''
to the field, I would expect that it fails, right?
Part of the solution is to set that to undefined, as there is no good reason to instantiate that field to an empty string. I am trying to check if there are any ramifications to this.
We have tested a typical recovery scenario on OpenSearch 2.8 and it works correctly.
I don't really see a difference in behavior between OpenSearch 1.3.6 and OpenSearch 2.8.0. My test was as follows:
GET /
PUT /testindex1
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"age": {
"type": "integer"
},
"patient": {
"type": "object"
}
}
}
}
GET testindex1/_mapping
#{
# "testindex1" : {
# "mappings" : {
# "properties" : {
# "age" : {
# "type" : "integer"
# },
# "patient" : {
# "type" : "object"
# }
# }
# }
# }
#}
PUT testindex1/_doc/1
{
"age": 5,
"office": "Mesa",
"patient": {
"name" : "John Doe",
"id" : "123456"
}
}
GET testindex1/_mapping
# {
# "testindex1" : {
# "mappings" : {
# "properties" : {
# "age" : {
# "type" : "integer"
# },
# "office" : {
# "type" : "text",
# "fields" : {
# "keyword" : {
# "type" : "keyword",
# "ignore_above" : 256
# }
# }
# },
# "patient" : {
# "properties" : {
# "id" : {
# "type" : "text",
# "fields" : {
# "keyword" : {
# "type" : "keyword",
# "ignore_above" : 256
# }
# }
# },
# "name" : {
# "type" : "text",
# "fields" : {
# "keyword" : {
# "type" : "keyword",
# "ignore_above" : 256
# }
# }
# }
# }
# }
# }
# }
# }
# }
PUT testindex1/_doc/2
{
"age": 50,
"office": "Mesa",
"patient": {
"name" : "Jane Doe",
"id" : "123456",
"birthdate": "1973-07-06"
}
}
GET testindex1
PUT testindex1/_doc/3
{
"age": 50,
"office": "Mesa",
"patient": {
"name" : "Julie Doe",
"id" : "123456",
"birthdate": ""
}
}
GET testindex1
DELETE testindex1
In both cases when I try to insert record 3 OpenSearch returns an error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse field [patient.birthdate] of type [date] in document with id '3'. Preview of field's value: ''"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse field [patient.birthdate] of type [date] in document with id '3'. Preview of field's value: ''",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "cannot parse empty date"
}
},
"status": 400
}
Though inserting a birthdate
of null
does work:
PUT testindex1/_doc/3
{
"age": 50,
"office": "Mesa",
"patient": {
"name" : "Julie Doe",
"id" : "123456",
"birthdate": null
}
}
Which we would expect given the docs discussion of null handling:
https://opensearch.org/docs/2.8/field-types/supported-field-types/date/
Oh, I've just realized the difference between my test above ... and the actual ex
indices ... I forgot that we set "dynamic": "false"
on our indices. So on an opensearc 1.3.6 state cluster the ex
index is:
{
"teraslice-1__ex": {
"mappings": {
"dynamic": "false",
"properties": {
"_context": {
"type": "keyword"
},
"_created": {
"type": "date"
},
"_has_errors": {
"type": "keyword"
},
"_slicer_stats": {
"type": "object"
},
"_status": {
"type": "keyword"
},
"_updated": {
"type": "date"
},
"active": {
"type": "boolean"
},
"ex_id": {
"type": "keyword"
},
"job_id": {
"type": "keyword"
},
"metadata": {
"type": "object",
"enabled": false
},
"recovered_execution": {
"type": "keyword"
},
"recovered_slice_type": {
"type": "keyword"
},
"slicer_hostname": {
"type": "keyword"
},
"slicer_port": {
"type": "keyword"
}
}
}
}
}
Ok, I've confirmed on Opensearch 1.3.6 that when the dynamic: false
setting is set at the top level of the mapping it still applies to fields on nested objects. Of course in the scenario above, that allows testindex1/_doc/3
to be created with an empty string for a birthdate since there is no type stored at that point.
I think the conclusion here is that the mapping for _slicer_stats
(here https://github.com/terascope/teraslice/pull/3376/files#diff-7c4a98523fd62bcda7fdd1fe4080de6892ea62def582b4428f48e10752fbb7efR46-R90) should be removed if these values are not searched by anything (including the recovery process). That being said should we also exclude the:
"_slicer_stats": {
"type": "object"
}
Mapping?
Brien points out that at least if we declare _slicer_stats
to be an object in the mapping but don't include it's subfields ... it may not index anything for a _slicer_stats
but it will probably do type checking. Yeah, thats what it does, if you have type object then try and write an integer into the field you get the following error:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "object mapping for [patient] tried to parse field [patient] as object, but found a concrete value"
}
],
"type" : "mapper_parsing_exception",
"reason" : "object mapping for [patient] tried to parse field [patient] as object, but found a concrete value"
},
"status" : 400
}
I noted that this PR closes this issue:
https://github.com/terascope/teraslice/pull/3376#issuecomment-1644352038
closing.
I am coming across a race condition that an execution ran under recovery has a mapping collusion with its slicer_stats object and fails to be marked complete when its done. Currently this seems to only manifest with opensearch v2.
From the error it seems the value
queuing_complete
is instantiated with an empty string, but throws when a date is given at the end.I checked the mapping for ex store and _slicer_stats is set to type object, but currently there are no schema values for all the keys
These are the initial values
I think it would be wise to add the fields to the schema, just don't know if it will have any additional effects to anything old
The alternative would be to cull any empty values when they are saved to the execution context so it only updates when it has a real value