Closed guyboertje closed 6 years ago
minor correction, the current syntax uses words for the operation:
math { calculate => [ [ "add", "a_field1", "a_field2", "a_target" ] ] }
because of this, I think I prefer the current syntax to the proposal 1), because the first element being the operation it gives a better sense of what is happening: "ok this is adding var1 to var2 and placing the result in var3"
my concern with the string expressions (2 and 3) is providing good feedback when the user tries something insane :)
The string should parse a regex during register (or better still a custom validator so it parses at config validation) I am ambivalent about the order of the multi element array version.
This sort of regex
/^(?<result>[^ =]+)\s*?=\s*?(?<arg1>\S+)\s*?(?<op>\+|-|\*{1,2}|\/|\%|fdiv)\s*?(?<arg2>\S+)\s*?$/
with results
Match 1
Full match 0-22 `[result1] = [var1] - 6`
Group `result` 0-9 `[result1]`
Group `arg1` 12-18 `[var1]`
Group `op` 19-20 `-`
Group `arg2` 21-22 `6`
Match 2
Full match 23-47 `[result1] = var1 ** var2`
Group `result` 23-32 `[result1]`
Group `arg1` 35-39 `var1`
Group `op` 40-42 `**`
Group `arg2` 43-47 `var2`
Match 3
Full match 48-71 `[result1] = var1 / var2`
Group `result` 48-57 `[result1]`
Group `arg1` 60-64 `var1`
Group `op` 65-66 `/`
Group `arg2` 67-71 `var2`
Match 4
Full match 72-100 `[result1] = [var1] fdiv var2`
Group `result` 72-81 `[result1]`
Group `arg1` 84-90 `[var1]`
Group `op` 91-95 `fdiv`
Group `arg2` 96-100 `var2`
Match 5
Full match 101-128 `[result1] = [var1] % [var2]`
Group `result` 101-110 `[result1]`
Group `arg1` 113-119 `[var1]`
Group `op` 120-121 `%`
Group `arg2` 122-128 `[var2]`
Match 6
Full match 129-151 `result1] = var1 * var2`
Group `result` 129-137 `result1]`
Group `arg1` 140-144 `var1`
Group `op` 145-146 `*`
Group `arg2` 147-151 `var2`
Match 7
Full match 152-171 `foo = [var1] + var2`
Group `result` 152-155 `foo`
Group `arg1` 158-164 `[var1]`
Group `op` 165-166 `+`
Group `arg2` 167-171 `var2`
Match 8
Full match 172-189 `foo = fdiv fdiv 5`
Group `result` 172-175 `foo`
Group `arg1` 178-182 `fdiv`
Group `op` 183-187 `fdiv`
Group `arg2` 188-189 `5`
Match 9
Full match 190-205 `bar = 1 / foo `
Group `result` 190-193 `bar`
Group `arg1` 197-198 `1`
Group `op` 199-200 `/`
Group `arg2` 201-204 `foo`
The string should parse a regex during register (or better still a custom validator so it parses at config validation)
Right, so any mistake made by the user would result in an error saying "invalid format for math operation", the regex failure will be very opaque. IMO I don't see a clear gain in UX in the proposals, but let's collect feedback from more folks 👍
We are discussing implementation details and not the merits of the syntaxes.
Better validation is possible.
We can use successive regexes from left to right to better mine the errors to present to the user. This would happen once.
For a string of bar = 1 / foo
we
/^(?<result>[^ =]+)/
1 / foo
target field not found, expected format C = a + b; got #{string}
^\s*+=\s*+
extract and move or fail validation no equals sign found, expected format c = a + b; got #{string}
^(?<arg1>\S+)\s*+(\+|-|\*{1,2}|\/|\%|fdiv)\s*+
extract and move or fail validation first operand not found, expected format c = A + b; got #{string}
^\s*+(?<op>\+|-|\*{1,2}|\/|\%|fdiv)\s*+
extract and move or fail validation operator not found, expected format c = a OPERATOR b; got #{string}
or invalid operator found, got #{operator}; expected one of #{OPERATORS.keys.join(', ')}
^(?<arg2>\S+)\s*+$
extract or fail validation second operand not found, expected format c = a + B; got #{string}
For me no. 2 is easier to read.
my concern about the implementation maps to a concern of UX; if we support "[result] = [var1] - [var2]"
as a string, I think it creates an expectation that the expression is extremely flexible.
I'd be tempted to try:
filter { math { calculate => [ "[result] = [var1] + [var2] + [var3]" ] } }
filter { math { calculate => [ "[result] = [var1] - [var2] * 23" ] } }
filter { math { calculate => [ "[result] = ([var1] - [var2]) / [var3]" ] } }
filter { math { calculate => [ "[result] = [var1]^2" ] } }
and so on. If the plan is to support more complex arithmetic then we can go forward with this approach, but for the current functionality I still think the current syntax provides a reasonable UX, i.e I prefer:
math { calculate => [ [ "sub", "[var1]", "[var2]", "[result]" ] ] }
# to:
math { calculate => [ [ "[result]", "[var1]", "-", "[var2]" ] ] }
different idea: why not use an RPN compatible notation so complex math would be possible and the interpretation would be simpler to implement?
@jsvd
I'd be tempted to try...
However, your expectation of a flexible syntax will have come from seeing an example in, say, discuss or a SO post and not from reading the docs or an Elastic blog post. You would get a validation error and then have to read the docs or seek help on discuss.
But I do get your point. The current syntax reigns in the expectation better.
However, this completely sets expectations:
math {
calculate => [
{
operator => "sub" # or "-"
lhs_field => "[var1]", # or left_field
rhs_field => "[var2]",
target => "[result]"
},
{
target => "[result]"
lhs_field => "[var3]"
operator => "/" # or "div"
rhs_literal => 100
}
]
}
@colinsurprenant
😄 Interesting idea (I considered it, no really, I did) but I fear very few users understand RPN. [btw, the smiley is a ref to stuff we were talking about earlier] I for one, very much do - having a) fixed HP scientific calculators way back in the Dark Ages and b) learned some form of programming with Forth which is exclusively a RPN + stack based language. an example I presume...
filter { math { calculate => [ "[var1] [var2] - [var3] / '[result]' =" ] } }
means
-
operator by taking two elements off of the stack, subtracting them and put the result on the stack/
operator by taking two elements off of the stack, dividing them and put the result on the stack[result]
on the stack=
by taking two elements off of the stack and setting the deeper as value to the shallower as field i.e.event.set(stack[0], stack[-1])
@jsvd & @colinsurprenant For me the crucial judgement call now is that, if we go with a collection based expression now, how hard will it be to migrate code and users to a string based expression (if and) when its asked for? Co-question, can (should) we support both syntaxes?
@guyboertje there is a slight learning curve but maybe the usage flexibility and implementation simplicity is worth it? maybe more like this?
filter { math { target => "[result]", calculate => "[var1] [var2] - [var3] /" } }
so whatever is at the top of the stack is put in target
.
Variation the not RPN string expression:
filter { math { target => "[result]", calculate => "[var1] - [var2]" } }
Complex RPN example (what is the calculation):
filter { math { target => "[result]", calculate => "[var1] [var2] add [var3] [var4] subtract multiply" } }
Would we support stack operations like swap
?
Getting the reciprocal of a calculated value.
"1 [var1] [var2] add [var3] [var4] subtract multiply divide"
vs
"[var1] [var2] add [var3] [var4] subtract multiply 1 swap divide"
To sum up my opinion:
I think RPN is hard for a human to parse, it was made for computers to evaluate expressions with a stack.
The infix notation w/ parenthesis support is the best if the plugin is going to support nested expressions and more complex math, e.g. "([field1]+[field2])*2-[field3]^2"
. Humans are used to reading this.
If we only support a single operation then I suggest the current prefix (polish) notation with an array and named operations.
I think RPN is hard for a human to parse, it was made for computers to evaluate expressions with a stack.
I digress but wanted to note that some of the best calculators, operated by humans, where HPs using RPN 😁
How about this syntax ?
filter {
math {
add => {
"result1" => ["%{var1}", "%{var2}"]
"result2" => ["%{result1}", "%{var3}"]
}
subtract => {
"result3 => ["%{result2}", 99]
}
We could allow only 2 elements in the array, or just define the order of operations left to right. It's basically the original syntax moved around some.
My assumption here is that people are looking to do relatively straightforward math in their logging pipeline. Perhaps adjust a value by a certain factor, or derive a new value based on two fields. Therefore I don't think we need to support arbitrary complexity at all (they can use ruby
for that). This is what guides my opinion below.
I'll start with those I'm not sure about:
In any case, I like the idea of having target => "[field_name]"
over having the target field name as part of the expression (array or string), whether or not it's interpolated. target
is almost always how we define the destination field of a filter. Following this convention is a big plus here, IMO.
With all of this in mind, I think my favourite options are:
target
convention, but the notation is relatively compact and straightforward.add
, multiply
and subtract
supporting arbitrary length arrays. No need to cap them those to 2 operands.target
convention
target
is a key in the hash, not a filter attribute, which may lead to confusiontarget
attribute. Readable and straightforward.@jakelandis
I have been bitten before with using values, e.g. add
and result1
as keys because they can't be defined twice. The simplicity of the current array of arrays is that it allows for left to right sequenced calculation where the later calcs use results from the earlier ones. In your example, doing a sequenced add
divide
add
is not possible, however I do understand the calculation might be transformable to suit.
@webmat
For the sequencing reasons given above, a single target
setting might be problematic. It would mean that we need to introduce the idea of registers
that hold temporary results. In my mind, the register
is an array that is the same size as the calculation array where index 0 stores the result of calculation 0 and so on with the target
field being filled with the value of the last element. Users refer to temporary results in other calculations using (I suggest) a $1
-> index 0 etc. Registers allow for a better error message, e.g. user refers to a register with a nil value as a opposed to (in using the event as a temporary store) a field with a nil value, any field could have a nil value, we would need to track field names used for temporary results in order to give the same kind of error message, "Register $3 is empty, did you mean $0, $1 or $2? These registers have values: 0, -9 and 42".
But this only applies when calculations are not independent. If you want to do several separate calculations on different fields then a target
is needed for each separate calculation.
But these sequences and separates with registers can be catered for:
filter {
math {
calculate => [
{
calculation => [
["+", "[var1]", "[var2]"], ["-", "[var3]", "[var4]"], ["*", "$1", "$2"]
],
target => "[result1]" # <- register[-1]
},
{
calculation => [
["-", "[ts_job_end]", "[ts_job_start]"]
],
target => "[job_duration]" # <- register[-1]
}
]
}
}
OK I have definitely gone off the string expression style.
We could support a layout
of prefix | infix | postfix
for users preferred style of expression - it tells us where the operator is to be found.
Also, what is the consensus on enforcing bracket field references?
Doing so means we differentiate between "[$1]"
being a field reference and "$1"
being a register reference without ambiguity.
Once again, I'm all for conventions. I have been confused in the past with fields being referenced sometimes in brackets and sometimes not. I would go for the brackets here.
I also can't see any merit in supporting interpolation in operand strings e.g.
["+", "%{[var1]}", "%{[var2]}"]
I also can't see any merit in supporting interpolation in operand strings e.g. ["+", "%{[var1]}", "%{[var2]}"]
well, since values can only be numeric I don't see how using string interpolation would be useful? are string operands possible??
OTOH just want to point out that in general there is no enforcement for using bracketed field reference (for root fields obviously) so I believe it would be the only place where this is strictly required? I don't really see a problem with that but could be confusing to some users that won't catch the note about being it necessary.
To clarify, requiring brackets for field names is really only needed if we introduce registers.
However, if a naming convention for register addresses can be devised that is intuitive and has a low collision potential with a users event field names (which are usually descriptive and not cryptic) then we don't need to enforce bracketed field names.
Something like one of these R[1], R(9), R{0}, ]1[, $[1], $(1), ${1}, %[0], &[1], &(0)
springs to mind.
Basically, some prefix with an integer surrounded by brackets of some description.
Perhaps a separator based approach works too , R|0, R\0, R^0
, although these are not so easy on the eye.
Another question, is zero based register index numbers intuitive to users that are not developers?
Conclusion:
I an going to stick with the current array based syntax but add registers.
I am collecting opinions on the syntax of the expression. Current:
filter { math { calculate => [ [ "-", "[var1]", "[var2]", "[result]" ] ] } }
Proposals:filter { math { calculate => [ [ "[result]", "[var1]", "-", "[var2]" ] ] } }
This switches the order to look more like written arithmetic.filter { math { calculate => [ "[result] = [var1] - [var2]" ] } }
This defines the expression as one string which is parsed into arrays as above.filter { math { calculate => [ "[result1] = %{[var1]} - %{[var2]}", "[result1] = %{[result1]} / 100" ] } }
This defines the expression as one string but uses interpolation and is parsed into arrays like the one above.Currently the operands
[var1]
and[var2]
must be fields but there is no reason why these can't be numeric values.