Open PeriLara opened 1 year ago
This looks like a bug @bjorncs?
Yes, this is clearly a bug.
@PeriLara Try to specify floating point values as limits in the predefined
expression as a workaround:
predefined(quality_spamness, bucket[0.0,1.0>, bucket[1.0,2.0>, bucket[2.0,inf>)
We have now identified the underlying problem.
A predefined bucket operator is given the type of the lower/upper bound values (double/long/raw/string). Each value evaluated from the predefined
expression is converted to the bucket type before being assigned a bucket. In the above scenario the quality_spamness
attribute value is bucket expression. Its values are rounded to the nearest integer, e,g 0.9576748013496399 is rounded up to 1.
The YQL query will therefore use the following mapping:
bucket[0, 1>
contains values [-0.5, 0,5>
bucket[1, 2>
contains values [0.5, 1.5>
bucket[2, inf>
contains values [1.5, inf>
Using decimal bounds in bucket definitions gives the expected behaviour for quality_spamness
.
The grouping query parser does not use type information when parsing the grouping expression to the intermediate AST.
Ideally it should deduce the type of any expression given to predefined
and select the correct bucket type based on that.
An easier approach is to just handle expressions that is just an attribute.
Yes it seems to work this way, at least for the documents with quality_spamness < 1.0 (strictly)!
But isn't it counterintuitive since bucketing on float is not allowed?
I mean, something like predefined(quality_spamness, bucket[0,0.5>, bucket[0.5,1.0>, bucket[1.0,inf>)
returns a 400 Bad request error with this message:
"message": "Could not create query from YQL: Bucket type mismatch, expected 'LongValue' got 'DoubleValue'.",
@PeriLara Make sure to specify decimal for all bounds (bucket[0,0.5>
=> bucket[0.0,0.5>
).
@PeriLara Make sure to specify decimal for all bounds (
bucket[0,0.5>
=>bucket[0.0,0.5>
).
Yes it works ! Thanks ! :)
The grouping query documentation on predefined
will be improved with https://github.com/vespa-engine/documentation/pull/2420.
Doesn't the documentation update primarily address this? I don't think we can do anything with this in Vespa's behavior.
Hi !
I was trying to get a distribution information on a double type field called quality_spamness and defined as is:
The vespa environment I'm working on: OS: Debian Infra: self hosted Vespa version: 7.581.33
But I'm getting some weird results telling me that some hits with values of quality spamness < 1 belong the bucket between 1 and 2.
Here is the yql query:
With this query I want to get 2 hits by group
Here is what I get:
As we can see, for the group [1;2[ the quality spamness of these hits does not correspond to the group they are associated with. for the group [0;1[ it seems that it does not take into account values in between, only values equal to 0.
If it is not clear, I can provide more information, let me know !
Thanks a lot!