Closed IanMenendez closed 1 month ago
Left a few review comments in https://github.com/opensearch-project/neural-search/pull/907
can we change the field name to "skip_if_absent" or something of this sort? Problem with "ignore" is that it has ambiguity of not specifying what will happen in case text is empty.
can we change the field name to "skip_if_absent" or something of this sort? Problem with "ignore" is that it has ambiguity of not specifying what will happen in case text is empty.
+1 to @martin-gaievski
@martin-gaievski @vibrantvarun I do not think the field name "skip_if_absent" makes sense
There are tons of OpenSearch ingest processors that currently have the ignore_missing field name
Examples: https://opensearch.org/docs/latest/ingest-pipelines/processors/split/#configuration-parameters https://opensearch.org/docs/latest/ingest-pipelines/processors/lowercase/#configuration-parameters https://opensearch.org/docs/latest/ingest-pipelines/processors/dissect/#configuration-parameters
I prefer ignore_missing to keep consistency between other ingest processors
if other processors has field with similar functionality then I agree, this name makes sense, although semantically it's not the best. Thanks for checking config of other processors.
Closing this issue as the PR has been merged. Thanks for your contribution @IanMenendez !
What solution would you like?
Currently, if a document is ingested by a text chunking processor and the input field is null then the text chunking processor will output an empty list. There is no way to ignore the text chunking processor if the field does not exist
The proposed solution is to add the ignore_missing field to text chunking processors.
If ignore_missing == true then fields that should be chunked but do not exist will not ingest an empty list, instead they will get skipped
example:
Processor:
Input:
Output:
if ignore_missing == false then it will continue to work as it currently does. Fields that do not exist will have an empty list as output
Processor:
Input:
Output:
The default value would be ignore_missing = false
What alternatives have you considered?
To my knowledge, there is no alternative to this