Closed maggienj closed 7 years ago
err... "Aggregator [_match] of type [value_count] cannot accept sub-aggregations" "aggs": { "v": { "extended_stats": { "field": "v" } }
_since we have "value_count", not sure, if we still need this "extendedstats" ?
{
"aggs": {
"_match": {
"aggs": {
"v": {
"extended_stats": {
"field": "v"
}
}
},
"value_count": {
"field": "a"
}
},
"_missing": {
"aggs": {
"v": {
"extended_stats": {
"field": "v"
}
}
},
"filter": {
"or": [
{
"missing": {
"field": "a"
}
},
{
"not": {
"terms": {
"a": [
"x",
"y"
]
}
}
}
]
}
}
}
}
{
"error": {
"root_cause": [
{
"type": "aggregation_initialization_exception",
"reason": "Aggregator [_match] of type [value_count] cannot accept sub-aggregations"
}
],
"type": "aggregation_initialization_exception",
"reason": "Aggregator [_match] of type [value_count] cannot accept sub-aggregations"
},
"status": 500
}
if sub_aggs for ("extended_stats") is removed then it doesn't raise the "sub_aggs" err.
{
"aggs": {
"_match": {
"value_count": {
"field": "a"
}
},
"_missing": {
"aggs": {
"v": {
"extended_stats": {
"field": "v"
}
}
},
"filter": {
"or": [
{
"missing": {
"field": "a"
}
},
{
"not": {
"terms": {
"a": [
"x",
"y"
]
}
}
}
]
}
}
}
}
the above code doesn't raise a "sub_aggs" err in es head ( it does raise a diff err ) not sure.... if we have to put the "if...else...condition " in the place where it adds the "extended_stats" ?
It may work, but I am concerned that the response of Elasticsearch is not what the aggs_iterator
is expecting: aggs_iterator
expects an inner _match
for each edge
in the ActiveData query, plus an aggs
for each select
. By removing one of those inner objects there is a mismatch.
Maybe value_count
is wrong: Try the filter aggregation; it allows sub-aggregations.
In aggs.py... in es_aggsop function. There exists one section where it has different stats and it shows different aggops. It has separate "if condition blocks" for different stats. it already has value_count as part of it. But, this test (test_time2_variables) is using the aggsop, "sum". And there doesn't exist a separate "if condition block" for "sum".... so the code flows to the "else" part....
Should a new "if condition block" be created for "sum" in this section? for s in many: if s.aggregate == "count": es_query.aggs[literal_field(canonical_name)].value_count.field = field_name s.pull = literal_field(canonical_name) + ".value"
Posted below, is the query it is generating now. Just wondering, how the correct query should look like for this test-query?
{
"aggs": {
"_match": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"terms": {
"field": "a",
"include": ["x", "y"]
}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"or": [
{"missing": {"field": "a"}},
{"not": {"terms": {"a": ["x", "y"]}}}
]}
}
},
"range": {
"field": "t",
"ranges": [
{
"from": 1497225600,
"to": 1497312000
},
{
"from": 1497312000,
"to": 1497398400
},
{
"from": 1497398400,
"to": 1497484800
},
{
"from": 1497484800,
"to": 1497571200
},
{
"from": 1497571200,
"to": 1497657600
},
{
"from": 1497657600,
"to": 1497744000
},
{
"from": 1497744000,
"to": 1497830400
}
]
}
},
"_missing": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"terms": {
"field": "a",
"include": ["x", "y"]
}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"or": [
{"missing": {"field": "a"}},
{"not": {"terms": {"a": ["x", "y"]}}}
]}
}
},
"filter": {"or": [
{"or": [
{"range": {"t": {"lt": 1497225600}}},
{"range": {"t": {"gte": 1497830400}}}
]},
{"missing": {"field": "t"}}
]}
}
},
"size": 0
}
es1.7's "or" and "not" has been changed to es5.x', bool query with "should" and "must_not" clauses. Also, changed "missing" field, to "must_not" + exists() field. Theses changes were applied to one block for testing, and the query looks like the one shown below.
{"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"match_all": {}}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"bool": {"should": [
{"bool": {"must_not": {"exists": {"field": "a"}}}},
{"bool": {"must_not": {"terms": {"a": ["x", "y"]}}}}
]}}
}
}}
{
"aggs": {
"_match": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"match_all": {}}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"bool": {"should": [
{"bool": {"must_not": {"exists": {"field": "a"}}}},
{"bool": {"must_not": {"terms": {"a": ["x", "y"]}}}}
]}}
}
},
"range": {
"field": "t",
"ranges": [
{
"from": 1497312000,
"to": 1497398400
},
{
"from": 1497398400,
"to": 1497484800
},
{
"from": 1497484800,
"to": 1497571200
},
{
"from": 1497571200,
"to": 1497657600
},
{
"from": 1497657600,
"to": 1497744000
},
{
"from": 1497744000,
"to": 1497830400
},
{
"from": 1497830400,
"to": 1497916800
}
]
}
},
"_missing": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"match_all": {}}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"bool": {"should": [
{"bool": {"must_not": {"exists": {"field": "a"}}}},
{"bool": {"must_not": {"terms": {"a": ["x", "y"]}}}}
]}}
}
},
"filter": {"bool": {"should": [
{
"default": null,
"lt": [1497312000, "t"]
},
{
"default": null,
"gte": [1497916800, "t"]
},
{"bool": {"must_not": {"exists": {"field": "t"}}}}
]}}
}
},
"size": 0
}
Now, a diff err , "[lt] query malformed"
ERROR: Bad Request: {"error":{"root_cause":[{"type":"parsing_exception","reason":"[lt] query malformed, no start_object after query name","line":1,"col":704}],"type":"parsing_exception","reason":"[lt] query malformed, no start_object after query name","line":1,"col":704},"status":400}
will check the equivalent of [lt] in es5.x
(just the bottom most part of the above query has some sort of a range using lt....without the "ranges" keyword... thats where the prob could be... ) maybe.... somewhere "ranges" keyword is missing in the above query or maybe, use a filtered query with rangefilter as its filter element...
if edge.allowNulls: # TODO: Use Expression.missing().esfilter() TO GET OPTIMIZED FILTER
missing_filter = set_default(
{"filter":
{"bool": {"should": [
{"range": {InequalityOp("lt", [edge.value, Literal(None, to_float(_min))]),
InequalityOp("gte", [edge.value, Literal(None, to_float(_max))]),
{"bool": {"must_not": edge.value.exists().to_esfilter()}}}}
]}}
}, es_query )
added range here... for "lt" "gt" ops.... but, it appears the code handles "range" differently in the next section... so, may need to find an alt solution for applying range here...
Now, it raises a diff err, a dict err.
Possibly it is because of this newly added...... "range" in the above query.
ERROR: unhashable type: 'dict'
caused by
ERROR: unhashable type: 'dict'
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\decoders.py", line 275, in _range_composer
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\decoders.py", line 295, in append_query
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\aggs.py", line 330, in es_aggsop
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\jx_usingES.py", line 157, in query
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\jx.py", line 71, in run
File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
filter query--->bool--->should--->range
currently, "range" is within "bool"-->"should". Is "range" allowed within "bool"--->should? may need to check that....
removed the bool with should in range.... so, the code now looks like..
if edge.allowNulls: # TODO: Use Expression.missing().esfilter() TO GET OPTIMIZED FILTER
missing_filter = set_default(
{"filter": {
InequalityOp("lt", [edge.value, Literal(None, to_float(_min))]),
InequalityOp("gte", [edge.value, Literal(None, to_float(_max))]),
{"bool": {"must_not": edge.value.exists().to_esfilter()}}}},
es_query)
still modifying this code....
now, changed this section to...
if edge.allowNulls: # TODO: Use Expression.missing().esfilter() TO GET OPTIMIZED FILTER
missing_filter = set_default(
{"filter": { "bool": { "should": {
InequalityOp("lt", [edge.value, Literal(None, to_float(_min))]),
InequalityOp("gte", [edge.value, Literal(None, to_float(_max))]),
{"bool": {"must_not": edge.value.exists().to_esfilter()}}}}
}
},
es_query)
err is
Main Thread - "__init__.py:32" (send_error) - WARNING: Could not process
{"meta": {"testing": true}, "from": "testing_000_g", "select": {"aggregate": "sum", "value": "v"}, "edges": ["a", {"domain": {"max": "today", "interval": "day", "type": "time", "min": "today-week"}, "value": "t"}], "format": "list"}
File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\__init__.py", line 32, in send_error
File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 100, in jx_query
File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
ERROR: unhashable type: 'dict'
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\decoders.py", line 274, in _range_composer
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\decoders.py", line 302, in append_query
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\aggs.py", line 330, in es_aggsop
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\jx_usingES.py", line 157, in query
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\jx.py", line 71, in run
File "C:\Users\user\PycharmProjects\ActiveData\active_data\actions\jx.py", line 62, in jx_query
File "C:\Users\user\PycharmProjects\ActiveData\active_data\__init__.py", line 54, in output
Look at
File "C:\Users\user\PycharmProjects\ActiveData\pyLibrary\queries\es14\decoders.py", line 274, in _range_composer
Check the code carefully: unhashable type: 'dict'
can result from a dictionary (like {"a":1}
) inside of a set (like {}
). You get the same error with {{"a":1}}
; you probably have extra curly braces around a dict.
It is easier to discuss code if you commit and push your issue branch, and make a pull request. Then you can see your net changes, and discuss specific lines that are causing a problem. As you make more changes, and push, the pull request will be updated.
Plus, I am able to pull your (incomplete) code to get the same error and diagnose the problem.
agreed. committed and pushed...
After some mods.... here is the generated query. *How should the correct generated query look like? ( not sure of how it should look like.... in order to tweak the "query generator" )
This is how the current query looks like....
Err: It looks like it is selecting "v" in the query.... but in the final output list, it is not displaying "v" values....
committed and pushed.
{
"aggs": {
"_match": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"match_all": {}}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"bool": {"should": [
{"bool": {"must_not": {"exists": {"field": "a"}}}},
{"bool": {"must_not": {"terms": {"a": ["x", "y"]}}}}
]}}
}
},
"range": {
"field": "t",
"ranges": [
{
"from": 1497398400,
"to": 1497484800
},
{
"from": 1497484800,
"to": 1497571200
},
{
"from": 1497571200,
"to": 1497657600
},
{
"from": 1497657600,
"to": 1497744000
},
{
"from": 1497744000,
"to": 1497830400
},
{
"from": 1497830400,
"to": 1497916800
},
{
"from": 1497916800,
"to": 1498003200
}
]
}
},
"_missing": {
"aggs": {
"_match": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"match_all": {}}
},
"_missing": {
"aggs": {"v": {"sum": {"field": "v"}}},
"filter": {"bool": {"should": [
{"bool": {"must_not": {"exists": {"field": "a"}}}},
{"bool": {"must_not": {"terms": {"a": ["x", "y"]}}}}
]}}
}
},
"filter": {"bool": {"should": [
{"range": {"t": {
"gte": 1498003200,
"lt": 1497398400
}}},
{"bool": {"must_not": {"exists": {"field": "t"}}}}
]}}
}
},
"size": 0
}
Continuation from the above.. Because, it is producing this output list.... Not sure, what happened to "v" property in the output list as it appears to be missing...
"data": [
{
"a": "x",
"t": 1497398400
},
{
"a": "x",
"t": 1497484800
},
instead of.... this...
"data": [
{
"a": "x",
"t": 1497398400,
"v": null
},
{
"a": "x",
"t": 1497484800,
"v": null
},
Use ES head to see the result of the query. Once you have confirmed the result is correct, then we can review the code that builds up the "data": []
you showed me.
I noticed this change:
- {"filter": {"or": [
- OrOp("or", [
- InequalityOp("lt", [edge.value, Literal(None, to_float(_min))]),
- InequalityOp("gte", [edge.value, Literal(None, to_float(_max))]),
- ]).to_esfilter(),
- edge.value.missing().to_esfilter()
- ]}},
- es_query
- )
+ {"filter": { "bool": { "should": [
+ {"range": { "t": {
+ "lt": to_float(_min),
+ "gte": to_float(_max)}}},
+ {"bool": {"must_not": edge.value.exists().to_esfilter()}}]
+ }
+ }
+ },
+ es_query)
You removed expressions (OrOp
, InequalityOp
) for their ES versions of the same. Each *Op
can emit its ElasticSearch expression that means the same thing. Maybe we can let these operators write out the correct ES filter for us. But first we must fix them:
Here is some code from OrOp
:
def to_esfilter(self):
return {"or": [t.to_esfilter() for t in self.terms]}
Change this code to use "bool.should", like everywhere else, then you can revert back to the code I mentioned a the top of this comment.
Furthermore, everywhere you see {"or" : []}
you could replace with OrOp("or", []).to_esfilter()
; and the operator will write the "bool.should" code for you.
Trying to bring back the functions and change the bool:should at the function level. Errs: "Expecting an expression". *committed and pushed
def to_esfilter(self):
return {"bool": {"should": [t.to_esfilter() for t in self.terms]}}
missing_filter = set_default(
{"filter":
{OrOp("or",
[
OrOp("or",[
InequalityOp("lt", [edge.value, Literal(None, to_float(_min))]),
InequalityOp("gte", [edge.value, Literal(None, to_float(_max))]),
]).to_esfilter(),
{"bool": {"must_not": edge.value.exists().to_esfilter()}}
]).to_esfilter()
}
},
es_query
)
Trying to change "missing" at the function level in expressions.py, instead of at the decoders.py level. missingOp will now have "bool": "must_not": "exists": "field": fieldname
def to_esfilter(self):
if isinstance(self.expr, Variable):
return {"bool": {"must_not": {"exists": {"field": self.expr.var}}}}
"Expecting an expression" means the expression constructors expect to be given expressions, not esfilters. The .to_esfilter()
should only be called, except on the topmost operator.
{"bool": {"must_not": edge.value.exists().to_esfilter()}}
gets reduced to
edge.value.missing()
because it is in the OrOp
be sure to push your changes from the last session so I may review
Looking at the test test_time2_variables
we can see that it does not use limit==0
, so all the code manipulation we are doing for limit==0
is necessary so we do not pass size=0
in a terms query, but it is not affecting the output for this test.
The next step is to confirm, or deny, the tuple
s coming out of aggs_iterator()
are correct.
please make a pull request for this issue so I can point to code
This test does not have a limit==0
, yet it branches on self.limit == 0
. The problem is aggs.py around line 109; limit=0
should be set to None
:
- limit = 0
- output[max_depth].append(AggsDecoder(edge, query, limit))
+ output[max_depth].append(AggsDecoder(edge, query, limit=None))
Remove all code we added that deal with limit==0
, including "_all" and make a pull request.
Pull request completed for this issue. unittest - test_time_domain.TestTime.test_time2_variables passed. Closing this issue.
Changed size 0 for meta.py issue and this was a follow up of that issue.