opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.12k stars 1.69k forks source link

[BUG] Cannot Use _source With Execute Painless Script API #13687

Open git-blame opened 2 months ago

git-blame commented 2 months ago

Describe the bug

According to documentation, with the filter context and supplying an in-memory doc:

The filter context runs the script as if the script were inside a script query. You must provide a test document in the context. The _source, stored fields, and _doc variables will be available to the script.

If I supply a "_source" attribute, OpenSearch complains about unknown field.

{
   "context_setup":
     "_source": 
...
}
{"error":{"root_cause":[{"type":"x_content_parse_exception","reason":"[1:239] [execute_script_context] unknown field [_source]"}],"type":"x_content_parse_exception","reason":"[1:250] [painless_execute_request] failed to parse field [context_setup]","caused_by":{"type":"x_content_parse_exception","reason":"[1:239] [execute_script_context] unknown field [_source]"}},"status":400}

If I include it in the document, I get metadata error:

{
   "context_setup":
    "document": {
      "_source": 
...
}
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Field [_source] is a metadata field and cannot be added inside a document. Use the index API request parameters."}],"type":"mapper_parsing_exception","reason":"failed to parse field [_source] of type [_source] in document with id '_id'. Preview of field's value: '{function=ecc, name=sec-t571k1, type=implementation}'","caused_by":{"type":"mapper_parsing_exception","reason":"Field [_source] is a metadata field and cannot be added inside a document. Use the index API request parameters."}},"status":400}

It is not clear how an in-memory document as part of the Request can include a _source per the document. I am testing a script that is accessing _source.

Related component

Other

To Reproduce

Issue a REST call to /_scripts/painless/_execute with a document containing a _source attribute.

{
  "script": {
    "source": "doc['grad'].value == true && doc['gpa'].value >= params.min_honors_gpa",
    "params": {
      "min_honors_gpa": 3.5
    }
  },
  "context": "filter",
  "context_setup": {
    "index": "testindex1",
    "document": {
     "_source": { ... }
      "grad": true,
      "gpa": 3.79
    }
  }
}

Expected behavior

Script can access the _source attribute per the documentation.

The filter context runs the script as if the script were inside a script query. You must provide a test document in the context. The _source, stored fields, and _doc variables will be available to the script.

Additional Details

No response

sandeshkr419 commented 1 month ago

Hi @git-blame

Thanks for bringing this up.

Actually the issue is with documentation. The field is source and not _source which you can see in later example requests. [1]


Here is the reproduce:

I have copy as curl the commands as per documentation:

Create index:

curl -XPUT "http://localhost:9200/testindex1" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "grad": {
        "type": "boolean"
      },
      "gpa": {
        "type": "float"
      }
    }
  }
}'

Run a script to determine if a student is eligible to graduate with honors:

[now the copy as curl gives a parsing error because of ' not processed correctly. (another mistake in documentation) [2]

curl -XPOST "http://localhost:9200/_scripts/painless/_execute" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "doc['grad'].value == true && doc['gpa'].value >= params.min_honors_gpa",
    "params": {
      "min_honors_gpa": 3.5
    }
  },
  "context": "filter",
  "context_setup": {
    "index": "testindex1",
    "document": {
      "grad": true,
      "gpa": 3.79
    }
  }
}'
{"error":{"root_cause":[{"type":"script_exception","reason":"compile error","script_stack":["doc[grad].value == true && do ...","    ^---- HERE"],"script":"doc[grad].value == true && doc[gpa].value >= params.min_honors_gpa","lang":"painless","position":{"offset":4,"start":0,"end":29}}],"type":"script_exception","reason":"compile error","script_stack":["doc[grad].value == true && do ...","    ^---- HERE"],"script":"doc[grad].value == true && doc[gpa].value >= params.min_honors_gpa","lang":"painless","position":{"offset":4,"start":0,"end":29},"caused_by":{"type":"illegal_argument_exception","reason":"cannot resolve symbol [grad]"}},"status":400

Adding up escape character \" instead of ' as below fixes the script:

 curl -X POST "http://localhost:9200/_scripts/painless/_execute" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "doc.containsKey(\"grad\") && doc[\"grad\"].value == true && doc.containsKey(\"gpa\") && doc[\"gpa\"].value >= params.min_honors_gpa",
    "params": {
      "min_honors_gpa": 3.5
    }
  },
  "context": "filter",
  "context_setup": {
    "index": "testindex1",
    "document": {
      "grad": true,
      "gpa": 3.79
    }
  }
}'

{"result":true}

Additionally, I think the documentation change is required to be fixed to _doc as well.

Basically, everything in the below statement is misleading:

The _source, stored fields, and _doc variables will be available to the script.

Let me try and fix up the documentation here. I think this issue fits better in documentation repo.

git-blame commented 1 month ago

Thanks @sandeshkr419 but my original problem is that the "old" documentation states that:

The _source, stored fields, and _doc variables will be available to the script.

But I cannot create, in the context_setup section, a test document where the_source section is populated. Therefore, if my script contains code to access doc.xxx that's ok. If my script contains code to access _source, the back-end will complain that this is an unknown field.

Outside of this REST API, I can write painless scripts that access _source (e.g., updateByQuery). Perhaps this REST API does not allow this. Then I simply need clarification in the documentation that, for example, only scripts that references doc can be executed.