likebike / fasteval

Fast and safe evaluation of algebraic expressions
https://crates.io/crates/fasteval/
MIT License
267 stars 26 forks source link

Support Vector results #5

Open zzFluke opened 4 years ago

zzFluke commented 4 years ago

At the moment supported evaluation results is Result<f64,Error>, will be enhanced with Advanced Number Types (Complex Numbers, Rational Numbers, Arbitrary Precision Numbers, Big Integers + Crypto), but is it even possible to have something like Result<Vec<T>,Error>? I imagine to have custom functions that can perform map or iteration operations over some data not just reduction.

likebike commented 4 years ago

It's an interesting idea. I'll think about it.

I'm afraid of blowing up the complexity of the language... I really don't want argument-list-expansion (like python's "*args" and "**kwargs") or vector slicing because I feel like it pushes the language a bit too far beyond its typical use-case.

likebike commented 4 years ago

I also feel like this kind of capability is already "sort-of supported" because custom functions can do anything. It would be easy to define a custom function "A" that stores vector results in an external storage location and produces NaN (or maybe an ID related to the external storage) as its return value, then custom function "B" would continue to operate on that external location and probably produce the reduction as its return value.

It's quite a hack and doesn't compose well, but it might be "good enough".

If I see real-life use cases for this, I will probably consider a general implementation, like what you're suggesting.

zzFluke commented 4 years ago

@likebike, thanks a lot for your comments and suggestion. Your idea to return ref on vector makes sense. What I'm going to do is to take your lib as a basis and try to implement higher level language and evaluation engine inspired by Excel Formula language and in particular Apache POI but without spreadsheet burden. I have little experience with Rust, though .. it can take a while.

likebike commented 4 years ago

I added a unit test which performs some vector operations with some custom functions. You might be able to use it as an example:

https://github.com/likebike/fasteval/blob/4b2b0341b82b5160e4cc14c2aeed7d643a30b7b9/tests/evalns.rs#L112

It might be a bit cryptic. Let me know if you have questions.

zzFluke commented 4 years ago

Great, thanks @likebike !

arnodb commented 4 years ago

Wouldn't the following work?

Add a generic param T: Add + Sub + Mul + Div and let Rust do the rest?

For the Vec case (or whatever complex type you can think about), I suppose a newtype that implements the required traits would allow matrix computations.

likebike commented 4 years ago

@arnodb I thought about doing that, but I assumed it wouldn't be adequate, since I do more than those four aritmetic operations. I also do Exponentiation, Modulo, Comparison, Logarithms, Rounding, Trigonometry, Absolute-Value, Sign, etc etc etc.

Do you know a way that I could achieve all those results just by adding more trait bounds? I guess I could define new traits for all those extra capabilities, and then implement those new traits for all the advanced number types. I guess I just thought it was more direct to avoid that.

arnodb commented 4 years ago

That gives a broader view of the problem to solve. I confess I integrated fasteval very quickly in replacement of meval which works but is rather old and constrained (in regards to the way to populate context - variables and functions).

First of all, exponentiation, logarithm, trigonometry are functions that take Ts and return a T. If fasteval provides a very small core (i.e. no preset functions) then they are not a problem. Another layer of the code can provide the user with functions that fit the T (s)he is working with.

I suppose the modulus falls into the same bucket as the +-*/ operations: https://doc.rust-lang.org/std/ops/trait.Rem.html. However if one wants to work with a T that has a modulus operator and another one with another T which has no modulus I'm not sure it is viable to try having different parsers with different generic bounds. The other way around is rather simple but might give the impression of laziness: everything can be turned into a function. The only issue I can see is that the parser may produce an AST with functions that may not exist for the type at eval time.

Maybe the first question to ask yourself is at which point do you know the type you are working with. I mean, in simple arithmetic, everything is a function (a trait to implement if you prefer). Like Rust, the point is to check at the very moment you know the T, that every operation involved in the parsed expression is available for that T or to fail as early as possible.

Could this check be static is another big design question :-). If you want it to be static then you need to know T before you parse expressions (so that the compiler issues an error for you) and disable parts of the parser based on that knowledge. That can reveal very complex to implement because it is probably impossible to implement "something" for a type "not implementing a given trait".

This comment is too long, hope it gives you ideas.

likebike commented 4 years ago

I integrated fasteval very quickly in replacement of meval

If you have some spare time, can you please tell me if you found any part of the transition from meval to fasteval difficult? Are there things you miss that meval does better than fasteval? Have you been able to notice any performance difference from the switch?

arnodb commented 4 years ago

That was awesomely easy. One single commit:

Thanks to a slightly better design, I removed an unnecessary RefCell and a context allocation.

The only thing that puzzled me was the strongly fixed capacity of the slab. I would have expected something like a vector that can grow until there is nothing left to parse. In my case everything is parsed on startup. But I worked around that by exposing different constructors. The only thing is that it could be hard to determine what capacity is necessary.

likebike commented 4 years ago

I'm happy to see that you were able to migrate such a large project so easily.

A few notes:

The only thing that puzzled me was the strongly fixed capacity of the slab. I would have expected something like a vector that can grow until there is nothing left to parse.

Right, fasteval achieves most of its performance advantage by pre-allocating all of its stuff, rather than performing many small allocations. That's why the size needs to be known ahead of time. It also helps with safety: https://docs.rs/fasteval/0.2.4/fasteval/#safety

...But I can understand that sometimes this requirement is a hassle and confusing. I'll consider adding a 'GrowableSlab' or something which would behave slower than a pre-allocated slab, but with more intuitive behavior.

Thanks so much for your feedback!

arnodb commented 4 years ago

HashMap vs. BTreeMap: good point. Although, in theory, HashMap is supposed to be faster for some operations at the (rather high) cost of memory consumption whereas BTreeMap is more memory efficient. But I think I need to refresh my knowledge about that.

Edit: https://doc.rust-lang.org/std/collections/index.html shows, as expected, O(1) for almost all operations with HashMap and O(log(n)) for almost all operations with BTreeMap. So choosing the last at the cost of operation times is either a matter of order/determinism or of memory footprint. It's not mentioned, but a B-Tree has a lower memory footprint than hashed entries.

parse_noclear: I read the documentation and saw that the other function (the one in the example) was always clearing the slab. So I looked for a good way to share the slab, it took a few minutes, thanks IntelliJ.

ExprNamespace vs. closure: again it's IntelliJ that, this time might have made me missed something. I just looked at the signature of the evaluation function and started implementing the trait. I'll have a closer look soon.

None vs. panic: interesting, I need to look at that too.

Is it that slower to use Vecs that never grow (after some time if need be) instead of using fixed size arrays?