sancarn commented 4 years ago

Evaluators

Evaluators are something we have wanted for a long time, and this project shouldn't be taken lightly. In an ideal world we would build an AST for each formulae, and evaluate the AST using some data parameter. That said currently we use string manipulation and Application.Evaluate()

Solutions

1. Use `Application.Evaluate()`

This is the simplest version of the evaluator and our current solution. It offers a quick and dirty approach to creating a parser.

Unfortunately it does have some issues:

No support for objects (unless you add serialisation of all objects which is nigh on impossible in vba)
No access to property accessors or method execution.
Cannot extend the language easily (e.g. custom syntax is difficult)
Application.Evaluate has a 255 character limit. Application.Evaluate("=""" & Space(256) & """") won't evaluate and instead will return Error 2015, even though Excel can happily evaluate formulas this length in cells...

2. Install dependency for `stdSettings`. Use range from setting to evaluate formula >255 characters long.

Get a dependency on stdSettings (or request a range). Formula longer than 255 characters can be evaluated as follows:

if len(sFormula)>255 then
  with stdSettings.Create().system("Evaluator")
    .formula = sFormula
    .calculate
    result = .value2
  end with
else
  result = Application.Evaluate(sFormula)
end if

Bs & Cs

No more 255 char limit with Application.Evaluate().
If permanent switch then we can also use UDFs!
Very Slow
Increase dependency tree for stdCallback and thus all other dependencies too.

3. Create a naive interpreter

A naive interpreter is one that is built on Recursive Descent Algorithm functioning on a token list. Something of the following nature:

tokenTypes = Array(
  Array("Literal", """(""""|[^""])*?"),
  Array("Literal", "d+\.\d+"),
  Array("LBracket","\("),
  Array("RBracket","\)"),
  Array("FuncName","[a-zA-Z][a-zA-Z_0-9]+"),
  Array("FuncParam",",")
  Array("PropAcc","."),
  Array("VarAcc","\$\d+"),
  Array("Op1","\*\/"),
  Array("Op2","\+\-"),
  Array("Op3","\=")
)

tokens = tokenise()

Then we loop over the tokens, and "Recompile" the token array as we go. E.G. for

=AND($1=1,$2="Hello")

thus tokens:

[["FuncName","AND"],["LBracket","("],["VarAcc","$1"],["Op3","="],["Literal","1"],["FuncParam",","],["VarAcc","$2"],["Op3","="],["Literal","""Hello"""],["RBracket",")"]

then:

While do
  Find func name
  Segment parens for func name
  Segment expressions for func name
  Evaluate expressions (recurse)
End
While do
  Find Op1s
  Evaluate Op1s
End
While do
  Find Op2s
  Evaluate Op2s
End
While do
  Find Op3s
  Evaluate Op3s
End

Bs & Cs

Faster to run
Faster to build than compiler
Complex/Slow development
Potentially not easy to extend

4. Build a Tokeniser, Parser (to CST) and Evaluator

This is a lot more ambitious. The idea is to build a Tokeniser --> Parser. We'll run this at evaluator creation time, and at evaluation time we'll only run the evaluator which evaluates the CST with the inserted context.

And($1=1,$2="Hello")

will result in

{
  "type":"Func",
  "params": [
    {
      type:"Operator", 
      value: "=", 
      params: [
        {"type":"In", value:1},
        {"type":"Literal",value:1}
      ]
    },{
      type:"Operator",
      value:"=",
      params:[
        {"type":"In", value:2},
        {"type":"Literal",value:"Hello"}
      ]
  ]
}

This AST can be very quickly evaluated when the evaluation step comes around.

Bs & Cs

Very fast
Easier to extend (because we use a grammar)
Complex and very slow to develop

5. Compile to machine code

This becomes the ultimate form of Evaluator. If we could compile down to machine code for evaluation this would be very ideal. AHK has a similar system called [MCode](https://autohotkey.com/board/topic/89283-mcode-function-onlinegenerator-x86-and-x64/. This would be the late game of the evaluator. It might be architecture specific however and literally involves implementing an assembler...

Some example assemblers which may or may not be useful:

Bs & Cs

Fastest option
Potentially easy to extend? Unsure on this one.
Likely ludicrously complex
Might be platform dependent?

6. Don't do this in runtime?

In the short to long term I do want to build a VBA-VBA compiler which can allow for the addition of new rules into the VBA grammar which can be evaluated and compiled down to plane VBA code. This will be undertaken in a seperate project (see VBA-Chevrotain)

One such idea is to add a new syntax to vba ()=>{} such that we can inject evaluators into code. E.G:

Sub Main()
  stdArray.Create(1,2,3).map((int)=>int*2)
End Sub

which will compile to:

Sub Main
  stdArray.Create(1,2,3).map(stdCallback.CreateFromObject(Me,"fcb00001"))
End Sub
Public Function fcb00001(ByVal int)
  fcb00001 = int*2
End Function

This is a parallel project to the VBA STD Library. That said a runtime will always be more convenient to those who can't install their own developer environment, myself included.

Critical performance concerns

It is critical that these formulae be as performant as possible, especially since they will typically be executed over large volumes of data. Therefore there are a few watchouts which we should ensure:

No object usage - Avoid use of objects where possible as property and method access are quite expensive. Use types instead.
Especially no dynamic objects like Dictionary. Use types instead.
Do not recreate arrays of tokens. If possible create an array and loop through it index wise.
Parsing can be a nightmare for backtracking - try to avoid.
Caching is vital. Do as much processing as you can at object creation, leaving as little as possible for evaluation time.

sancarn commented 4 years ago

TarVK suggested this approach

sancarn commented 4 years ago

TarVK's suggestion is great! Really powerful and extremely simple approach. I rewrote it in VBA tonight

There are some leftovers throughout the script as well, from the previous method I was trying (far more complicated than this).

Will be gradually improving this to become a fully fledged formula parser. Will be making it class based of course also, and eventually will build it into stdCallback. Hoping to also use provide attachable UDFs to extend the formula language more. It's definitely looking bright for the future!

sancarn commented 4 years ago

stdLambda is pretty much fully featured now, all operations are implemented. The performance isn't great though, which is the main downside. TarVK and I tested compiling to an intermediate language and this seems to really help performance (to the point our test was faster than Application.Evaluate at runtime for a very large expression!! The expression we tested on was:

'0+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1+(3*(2+5)+5*8/2^(2+1))/26-1
'i.e.
   opsPtr = 0
    Call addOp(push, , 0)
    For i = 0 To 8
        Call addOp(push, , 3)
        Call addOp(push, , 2)
        Call addOp(push, , 5)
        Call addOp(binary, opAdd)
        Call addOp(binary, opMul)
        Call addOp(push, , 5)
        Call addOp(push, , 8)
        Call addOp(binary, opMul)
        Call addOp(push, , 2)
        Call addOp(push, , 2)
        Call addOp(push, , 1)
        Call addOp(binary, opAdd)
        Call addOp(binary, opPow)
        Call addOp(binary, opDiv)
        Call addOp(binary, opAdd)
        Call addOp(push, , 26)
        Call addOp(binary, opDiv)
        Call addOp(push, , 1)
        Call addOp(binary, opSub)
        Call addOp(binary, opAdd)
    Next

Comparitively to stdLambda which executed in 20 seconds (over 10,000 executes), this expression took 1.14s to evaluate, 20x faster.

We'll look into creating this operations list to speed up performance going forwards as this would be incredibly useful on large datasets (1 minute instead of 20 minutes!)

Edit: Further testing shows varying performance differences on different machines. On mine it's still around 10x faster, but on TarVKs it's 1-2x slower... Ideally more investigation should be done before merge.

sancarn commented 4 years ago

List of current supported VBA functions:

eval - evaluates an expression
thisworkbook - evaluates to ThisWorkbook object
application  -evaluates to Application object

'Standard VBA functions
abs
int
fix
exp
log
sqr
sgn
rnd
cos
sin
tan
atn
asin
acos
vbcrlf
vbcr
vblf
vbnewline
vbnullchar
vbnullstring
vbobjecterror
vbtab
vbback
vbformfeed
vbverticaltab
array
createobject
getobject
iff
cbool
cbyte
ccur
cdate
csng
cdbl
cint
clng
cstr
cvar
cverr
asc
chr
format
hex
oct
str
val
trim
lcase
ucase
right
left
mid
now

List of current supported operators

... is ...
... mod ...
... and ...
... or ...
... xor ...
not ...
if ... then ... else
if ... then ... else if ... then else ...  'Via composability
funcName
funcName(...)
a.b.c 'Property access
a#b#c 'Method access
... * ...
... / ...
... + ...
... - ...
... = ...
... <> ...
... < ...
... <= ...
... > ...
... >= ...
... & ...
... like ...

sancarn / stdVBA

stdCallback.CreateEvaluator() #4

Evaluators

Solutions

1. Use `Application.Evaluate()`

2. Install dependency for `stdSettings`. Use range from setting to evaluate formula >255 characters long.

Bs & Cs

3. Create a naive interpreter

Bs & Cs

4. Build a Tokeniser, Parser (to CST) and Evaluator

Bs & Cs

5. Compile to machine code

Bs & Cs

6. Don't do this in runtime?

Critical performance concerns

sancarn / stdVBA

stdCallback.CreateEvaluator() #4

Evaluators

Solutions

1. Use Application.Evaluate()

2. Install dependency for stdSettings. Use range from setting to evaluate formula >255 characters long.

Bs & Cs

3. Create a naive interpreter

Bs & Cs

4. Build a Tokeniser, Parser (to CST) and Evaluator

Bs & Cs

5. Compile to machine code

Bs & Cs

6. Don't do this in runtime?

Critical performance concerns

1. Use `Application.Evaluate()`

2. Install dependency for `stdSettings`. Use range from setting to evaluate formula >255 characters long.