Closed sancarn closed 4 years ago
TarVK suggested this approach
TarVK's suggestion is great! Really powerful and extremely simple approach. I rewrote it in VBA tonight
There are some leftovers throughout the script as well, from the previous method I was trying (far more complicated than this).
Will be gradually improving this to become a fully fledged formula parser. Will be making it class based of course also, and eventually will build it into stdCallback. Hoping to also use provide attachable UDFs to extend the formula language more. It's definitely looking bright for the future!
stdLambda is pretty much fully featured now, all operations are implemented. The performance isn't great though, which is the main downside. TarVK and I tested compiling to an intermediate language and this seems to really help performance (to the point our test was faster than Application.Evaluate
at runtime for a very large expression!! The expression we tested on was:
'0+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1
'i.e.
opsPtr = 0
Call addOp(push, , 0)
For i = 0 To 8
Call addOp(push, , 3)
Call addOp(push, , 2)
Call addOp(push, , 5)
Call addOp(binary, opAdd)
Call addOp(binary, opMul)
Call addOp(push, , 5)
Call addOp(push, , 8)
Call addOp(binary, opMul)
Call addOp(push, , 2)
Call addOp(push, , 2)
Call addOp(push, , 1)
Call addOp(binary, opAdd)
Call addOp(binary, opPow)
Call addOp(binary, opDiv)
Call addOp(binary, opAdd)
Call addOp(push, , 26)
Call addOp(binary, opDiv)
Call addOp(push, , 1)
Call addOp(binary, opSub)
Call addOp(binary, opAdd)
Next
Comparitively to stdLambda
which executed in 20 seconds (over 10,000 executes), this expression took 1.14s to evaluate, 20x faster.
We'll look into creating this operations list to speed up performance going forwards as this would be incredibly useful on large datasets (1 minute instead of 20 minutes!)
Edit: Further testing shows varying performance differences on different machines. On mine it's still around 10x faster, but on TarVKs it's 1-2x slower... Ideally more investigation should be done before merge.
List of current supported VBA functions:
eval - evaluates an expression
thisworkbook - evaluates to ThisWorkbook object
application -evaluates to Application object
'Standard VBA functions
abs
int
fix
exp
log
sqr
sgn
rnd
cos
sin
tan
atn
asin
acos
vbcrlf
vbcr
vblf
vbnewline
vbnullchar
vbnullstring
vbobjecterror
vbtab
vbback
vbformfeed
vbverticaltab
array
createobject
getobject
iff
cbool
cbyte
ccur
cdate
csng
cdbl
cint
clng
cstr
cvar
cverr
asc
chr
format
hex
oct
str
val
trim
lcase
ucase
right
left
mid
now
List of current supported operators
... is ...
... mod ...
... and ...
... or ...
... xor ...
not ...
if ... then ... else
if ... then ... else if ... then else ... 'Via composability
funcName
funcName(...)
a.b.c 'Property access
a#b#c 'Method access
... * ...
... / ...
... + ...
... - ...
... = ...
... <> ...
... < ...
... <= ...
... > ...
... >= ...
... & ...
... like ...
Evaluators
Evaluators are something we have wanted for a long time, and this project shouldn't be taken lightly. In an ideal world we would build an AST for each formulae, and evaluate the AST using some data parameter. That said currently we use string manipulation and
Application.Evaluate()
Solutions
1. Use
Application.Evaluate()
This is the simplest version of the evaluator and our current solution. It offers a quick and dirty approach to creating a parser.
Unfortunately it does have some issues:
Application.Evaluate
has a 255 character limit.Application.Evaluate("=""" & Space(256) & """")
won't evaluate and instead will returnError 2015
, even though Excel can happily evaluate formulas this length in cells...2. Install dependency for
stdSettings
. Use range from setting to evaluate formula >255 characters long.Get a dependency on stdSettings (or request a range). Formula longer than 255 characters can be evaluated as follows:
Bs & Cs
Application.Evaluate()
.stdCallback
and thus all other dependencies too.3. Create a naive interpreter
A naive interpreter is one that is built on Recursive Descent Algorithm functioning on a token list. Something of the following nature:
Then we loop over the tokens, and "Recompile" the token array as we go. E.G. for
thus tokens:
then:
Bs & Cs
4. Build a Tokeniser, Parser (to CST) and Evaluator
This is a lot more ambitious. The idea is to build a Tokeniser --> Parser. We'll run this at evaluator creation time, and at evaluation time we'll only run the evaluator which evaluates the CST with the inserted context.
will result in
This AST can be very quickly evaluated when the evaluation step comes around.
Bs & Cs
5. Compile to machine code
This becomes the ultimate form of Evaluator. If we could compile down to machine code for evaluation this would be very ideal. AHK has a similar system called [MCode](https://autohotkey.com/board/topic/89283-mcode-function-onlinegenerator-x86-and-x64/. This would be the late game of the evaluator. It might be architecture specific however and literally involves implementing an assembler...
Some example assemblers which may or may not be useful:
Bs & Cs
6. Don't do this in runtime?
In the short to long term I do want to build a VBA-VBA compiler which can allow for the addition of new rules into the VBA grammar which can be evaluated and compiled down to plane VBA code. This will be undertaken in a seperate project (see VBA-Chevrotain)
One such idea is to add a new syntax to vba
()=>{}
such that we can inject evaluators into code. E.G:which will compile to:
This is a parallel project to the VBA STD Library. That said a runtime will always be more convenient to those who can't install their own developer environment, myself included.
Critical performance concerns
It is critical that these formulae be as performant as possible, especially since they will typically be executed over large volumes of data. Therefore there are a few watchouts which we should ensure: