maggienj / ActiveData

Provide high speed filtering and aggregation over data
Mozilla Public License 2.0
0 stars 0 forks source link

fix - tests.test_jx.test_filters.TestFilters.test_empty_in #48

Closed maggienj closed 7 years ago

maggienj commented 7 years ago

fix unit test tests.test_jx.test_filters.TestFilters.test_empty_in

maggienj commented 7 years ago

not sure, where this "filter" is getting added to in this query building....

caused by
    ERROR: Problem with call to http://localhost:9200/testing_000_s/test_result/_search
{"query": {"bool": {"filter": {"must_not": {"match_all": {}}}}}, "stored_fields": ["a"], "from": 0, "size": 10}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 777, in post
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1090, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
    File "C:\Python27\lib\site-packages\flask\app.py", line 1598, in dispatch_request
    File "C:\Python27\lib\site-packages\flask\app.py", line 1612, in full_dispatch_request
    File "C:\Python27\lib\site-packages\flask\app.py", line 1982, in wsgi_app
    File "C:\Python27\lib\site-packages\flask\app.py", line 1994, in __call__
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 197, in execute
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 209, in run_wsgi
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 267, in handle_one_request
    File "C:\Python27\lib\BaseHTTPServer.py", line 340, in handle
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 232, in handle
    File "C:\Python27\lib\SocketServer.py", line 652, in __init__
    File "C:\Python27\lib\SocketServer.py", line 331, in finish_request
    File "C:\Python27\lib\SocketServer.py", line 596, in process_request_thread
    File "C:\Python27\lib\threading.py", line 754, in run
    File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
    File "C:\Python27\lib\threading.py", line 774, in __bootstrap
caused by
    ERROR: Bad Request: {"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [must_not]","line":1,"col":44}],"type":"parsing_exception","reason":"no [query] registered for [must_not]","line":1,"col":44},"status":400}
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 755, in post
    File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\env\elasticsearch.py", line 1090, in search
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es09\util.py", line 40, in post
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 194, in extract_rows
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\es14\setop.py", line 64, in es_setop
    File "C:\Users\user\PycharmProjects\ActiveData\jx_elasticsearch\jx_usingES.py", line 160, in query
    File "C:\Users\user\PycharmProjects\ActiveData\jx_python\jx.py", line 71, in run
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
    File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
    File "C:\Python27\lib\site-packages\flask\app.py", line 1598, in dispatch_request
    File "C:\Python27\lib\site-packages\flask\app.py", line 1612, in full_dispatch_request
    File "C:\Python27\lib\site-packages\flask\app.py", line 1982, in wsgi_app
    File "C:\Python27\lib\site-packages\flask\app.py", line 1994, in __call__
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 197, in execute
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 209, in run_wsgi
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 267, in handle_one_request
    File "C:\Python27\lib\BaseHTTPServer.py", line 340, in handle
    File "C:\Python27\lib\site-packages\werkzeug\serving.py", line 232, in handle
    File "C:\Python27\lib\SocketServer.py", line 652, in __init__
    File "C:\Python27\lib\SocketServer.py", line 331, in finish_request
    File "C:\Python27\lib\SocketServer.py", line 596, in process_request_thread
    File "C:\Python27\lib\threading.py", line 754, in run
    File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
    File "C:\Python27\lib\threading.py", line 774, in __bootstrap
maggienj commented 7 years ago

created a pull request to show file diffs.

maggienj commented 7 years ago

will check Python dicts or Data to see if that is adding "filter" in this query-building-block.

maggienj commented 7 years ago

test passed. created pull-req. ready to merge.

maggienj commented 7 years ago

Few Qs..... trying to figure these out..... what is the diff between these two queries shown below...

trying to understand the logical difference between these two queries.... ( first one has two bools and a filter... second one has just one bool without a filter ,,,, both has must_not match_all)

Having said that, what is the logical difference of these two queries? ( not sure of the logical diff )

We know, "filter" = where clause... if so, how do we frame the logical purpose of these two queries.... Are they one and the same?

{
        "from": 0,
        "query": {"bool": {"filter": {"bool": {"must_not": {"match_all": {}}}}}},
        "size": 10,
        "stored_fields": ["a"]
    }

and this query

{
  "from": 0,
  "query": {
    "bool": {
      "must_not": {
        "match_all": {}
      }
    }
  },
  "size": 10,
  "stored_fields": [
    "a"
  ]
}
klahnakoski commented 7 years ago

Elasticsearch was meant for text searching, and it has "scoring" to help it sort the search results it does get. bool.filter does not score, while bool.must does track the score.

For the sake of clarity, I will now call JSON structures that conform to legitimate ES filters as "where expressions"; It is already confusing enough with a filter clause and a filter aggregation, both which accept where expressions.

Since bool.filter requires a where expression, we can give it bool.must_not.match_all. The query also accepts a where expression, so bool.must_not.match_all can go there too. The two queries are similar in the number of records they return; only the first will loose all the scoring and the second will not.

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-filter-context.html

maggienj commented 7 years ago

since, we are using the first query , it will not have any scoring.
so, if i understand correctly, we prefer to "not to have scoring" in this unittest.

Not sure, why we do not prefer to have scoring in this unittest? Is it because, "with scoring" will be some sort of resource intensive for ES to score?

klahnakoski commented 7 years ago

We would like to stay consistent, and use the second query: we should not be seeing queries like the first.

Scoring is computationally expensive, so we would like to avoid it. But, we decided in 'bool.must' and we will stick with that for now because we know it works. We can experiment with bool.filter once all the tests are passing.

maggienj commented 7 years ago

merged. closing this. https://github.com/klahnakoski/ActiveData/pull/34