opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
97 stars 135 forks source link

[ENHANCEMENT] Support optional field map in ml inference processors #3211

Open mingshl opened 2 days ago

mingshl commented 2 days ago

Is your feature request related to a problem? Currently, in ml inference processors, when input_maps are config, those fields are mandatory field to map from ingest document or from search response or from search query. When a field is missing, we will not proceed with predictions.

What solution would you like? we can make an enhancement, to give a default value in the model_config, then we can check even though a field is not present in the query body, but there is default value in model_config, we can still proceed forward with prediction.

for example:

            "ml_inference": {
                "tag": "ml_inference",
                "description": "This processor is going to run ml inference during search request",
                "model_id": "ZuOjE5MBVOtxcB9MLrYL",
                "query_template": "{\"size\": 2,\"query\": {\"knn\": {\"review_embedding\": {\"vector\": ${image_embedding},\"k\": 3}}}}",
                "function_name": "REMOTE",
                "input_map": [
                    {
                        "text": "query.term.review.value" ## This can also be null 
                        "image": "query.term.review_embedding.value" ## This can also be null 
                    }
                ],
                "output_map": [
                    {
                        "image_embedding": "response"
                    }
                ],
                "model_config": 
               {
                  "text": "",     ## this is the default value
                  "image":""    ## this is the default value
                },
                "ignore_missing": false,
                "ignore_failure": false
            }

What alternatives have you considered? TBD

Do you have any additional context?

ylwu-amzn commented 1 day ago
"text": "query.term.review.value" ## This can also be null 
"image": "query.term.review_embedding.value" ## This can also be null 

My understanding: if any of these parameter value is null and ignore_missing as true, we should use null value in model input. Then the default value in model_config seems not necessary for some model if they allow optional text or image. Seems not necessary to define default value as null in model_config

"model_config": {
    "text": null
    "image": null
},

For some model, they may prefer some default value. I remember this should be supported today , for example?

"input_map": [
    {
        "input_type": "query.term.review.value"
    }
]
"model_config": {
    "input_type": "image"
},

If "query.term.review.value" doesn't exist, will we use image as default value of input_type today ?

mingshl commented 1 day ago

maybe we can refer to this blueprint when a key is not found, we can refer this setting: inputText:-null

 "ml_inference": {
                "tag": "ml_inference",
                "description": "This processor is going to run ml inference during search request",
                "model_id": "ZuOjE5MBVOtxcB9MLrYL",
                "query_template": "{\"size\": 2,\"query\": {\"knn\": {\"review_embedding\": {\"vector\": ${image_embedding},\"k\": 3}}}}",
                "function_name": "REMOTE",
                "input_map": [
                    {
                        "text": "query.term.review.value:-null" ## This can also be null 
                        "image": "query.term.review_embedding.value:-null" ## This can also be null 
                    }
                ],
                "output_map": [
                    {
                        "image_embedding": "response"
                    }
                ],
                "model_config": 
               {
                },
                "ignore_missing": false,
                "ignore_failure": false
            }