maggienj / ActiveData

Provide high speed filtering and aggregation over data
Mozilla Public License 2.0
0 stars 0 forks source link

Fix TestSetOps.test_left #29

Closed klahnakoski closed 7 years ago

klahnakoski commented 7 years ago

ESv5+ uses a new scripting language, it also seems to limit the number of dynamic scripts via elasticsearch config file.

To fix this test you must change the expression.py file. It contains all the code required to write ES scripts. See the to_ruby() calls. For each to_ruby() method, make ato_painless()method. Alter theto_painless()method forLeftOp` to emit a painless script that calculates the left part of a string.

https://www.elastic.co/guide/en/elasticsearch/painless/master/painless-getting-started.html

maggienj commented 7 years ago

i guess, the way the substring is fetched has to be changed. will look up es5 command for substring.

caused by
    ERROR: Problem with search (path=/testing_000_c/test_result/_search):
    {
        "from": 0,
        "query": {"bool": {"filter": {"match_all": {}}}},
        "script_fields": {"v": {"script": "((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))"}},
        "size": 10,
        "stored_fields": []
    }
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1097, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in outpu
caused by
    ERROR: Problem with call to http://localhost:9200/testing_000_c/test_result/_search
{"query": {"bool": {"filter": {"match_all": {}}}}, "stored_fields": [], "from": 0, "script_fields": {"v": {"script": "((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))"}}, "size": 10}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 777, in post
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1090, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
caused by
    ERROR: Internal Server Error: {"error":{"root_cause":[{"type":"script_exception","reason":"compile error","script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"testing_000_c20170713_002918","node":"OX5gNnXQRtW6gThYBtxJ4w","reason":{"type":"script_exception","reason":"compile error","caused_by":{"type":"illegal_argument_exception","reason":"Unknown call [max] with [2] arguments."},"script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}}],"caused_by":{"type":"script_exception","reason":"compile error","caused_by":{"type":"illegal_argument_exception","reason":"Unknown call [max] with [2] arguments."},"script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}},"status":500}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 755, in post
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1090, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
maggienj commented 7 years ago

here is one way of fetching a substring from a string... here , the params length is specified as a separate inner parameter to specify how many characters to be fetched. in this example, it is 4.

{
    "query" : {
        "match_all": {}
    },
    "script_fields" : {
        "left_field" : {
            "script" : {
                "inline": "doc.field.value.substring(0, length)"
                "params": {
                    "length": 4
                }
            }
        }
    }
}

https://stackoverflow.com/questions/35644126/elasticsearch-get-for-a-substring-in-the-value-of-a-document-field

maggienj commented 7 years ago

modified the code to add params for length and the current generated query is shown as below.... still raises err...


    {
        "from": 0,
        "query": {"bool": {"filter": {"match_all": {}}}},
        "script_fields": {"v": {"script": "((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length()) \"params\":{\"length\":2}).intValue())"}},
        "size": 10,
        "stored_fields": []
    }

committed and pushed.

maggienj commented 7 years ago

Trying to create a pullRequest and it says es5 hasn't been merged yet. tried to merge-and-pull again and still the same message.

klahnakoski commented 7 years ago

you do not need to use script.params; the expression code generation can handle it, plus more; for example length can also be an expression.

klahnakoski commented 7 years ago

be sure to pull from upstream es5

maggienj commented 7 years ago

will pull upstream es5 and create a new branch for this.

maggienj commented 7 years ago

pulled upstream es5 and created a new branch issue29-c-test-left.

caused by
    ERROR: Problem with call to http://localhost:9200/testing_000_l/test_result/_search
{"query": {"bool": {"filter": {"match_all": {}}}}, "stored_fields": [], "from": 0, "script_fields": {"v": {"script": "((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))"}}, "size": 10}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 777, in post
maggienj commented 7 years ago

no changes in the new branch, so pull-Request doesn't go through....

unit test raises this err...


caused by
    ERROR: Internal Server Error: {"error":{"root_cause":[{"type":"script_exception","reason":"compile error","script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"testing_000_l20170713_021741","node":"OX5gNnXQRtW6gThYBtxJ4w","reason":{"type":"script_exception","reason":"compile error","caused_by":{"type":"illegal_argument_exception","reason":"Unknown call [max] with [2] arguments."},"script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}}],"caused_by":{"type":"script_exception","reason":"compile error","caused_by":{"type":"illegal_argument_exception","reason":"Unknown call [max] with [2] arguments."},"script_stack":["... [\"v\"].value.substring(0, max(0, min(doc[\"v\"].value ...","                             ^---- HERE"],"script":"((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))","lang":"painless"}},"status":500}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 755, in post
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1090, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
maggienj commented 7 years ago

test passed. but a diff test by name "test_number" in the same test-group passes in es5 branch.... fails in this branch.

Brought this up here, because whatever change that has been made to fix this test, seems to have impacted the run of "test_number" unit-test.

klahnakoski commented 7 years ago

if a new test if failing, then fix it too as part of this branch. It is good that you noticed.

maggienj commented 7 years ago

Extraneous conditional statement."},"script_stack":["(!((false)

"script_stack":[
"(!((false)?(doc["a"].values. ...",
"   ^---- HERE"],"script":"(!((false)?(doc[\"a\"].values.size()==0):((doc[\"a\"].values).contains(\"e\")))) ? (1) : (0)","lang":"painless"}}],

"caused_by":{"type":"script_exception","reason":"compile error","caused_by":{"type":"illegal_argument_exception",
"reason":"Extraneous conditional statement."},

"script_stack":[
"(!((false)?(doc[\"a\"].values. ...",
"   ^---- HERE"],"script":"(!((false)?(
doc[\"a\"].values.size()==0):
          ((doc[\"a\"].values).contains(\"e\")))) ? (1) : 
           (0)","lang":"painless"}},"status":500}
maggienj commented 7 years ago

when test_number was run individually, it passed. it is strange that it passes in the group in es5 and not in this current branch.... though it does pass when run on its own.....

maggienj commented 7 years ago

pull request completed. merged. closing this.

maggienj commented 7 years ago

this used to pass earlier. when ran today, it raises err. is it because of something that got reverted at the time of refactoring? ( was this reverted intentionally to use partial_eval or was it an unknown slip at the time of refactoring? not sure what the reason for this to appear again and raise errs )

the query that is generated currently includes ((false)) , which was originally removed from this as part of fixing issue_29.

It is showing up again now.... so, re-opening this isssue.

caused by
    ERROR: Problem with search (path=/testing_000_r/test_result/_search):
    {
        "from": 0,
        "query": {"bool": {"filter": {"match_all": {}}}},
        "script_fields": {"v": {"script": "((doc[\"v\"].empty) || (false)) ? null : (doc[\"v\"].value.substring(0, max(0, min(doc[\"v\"].value.length(), 2)).intValue()))"}},
        "size": 10,
        "stored_fields": []
    }
klahnakoski commented 7 years ago

my version b03a29e88601a9bb8e8 passes

maggienj commented 7 years ago

i'm in the new branch issue55-or-and-bool (for which pull request has been opened too) Since, it is a replace of or, and, not to bool and since these have been replaced, it has been marked as "ready to merge"

maggienj commented 7 years ago

the fix which we applied at the time of merge made this unit test to pass.

i think something has happened during code refactoring event . i guess, es52/expressions.py wasn't created using the latest during its initial creation....

these were the fixes which were merged then... file changes shows them.. https://github.com/klahnakoski/ActiveData/pull/37/files

maggienj commented 7 years ago

no changes. pull request is complete. merged. closing this. https://github.com/klahnakoski/ActiveData/pull/37