Engine::new, Engine::new_raw bench is slow

GopherJ commented 4 years ago

Currently no matter if we are using Engine::new_raw or Engine::new, the creation is always slow even when I have deactivated many features.

test b_benchmark_creating_engine_instance        ... bench:      43,615 ns/iter (+/- 1,539)
test b_benchmark_creating_raw_engine_instance    ... bench:      19,203 ns/iter (+/- 125)
test b_benchmark_creating_scope_instance         ... bench:           5 ns/iter (+/- 0)

The result is from a laptop i7-10gen

GopherJ commented 4 years ago

Cargo.toml

rhai = { version = "0.12.0", default-features = false, features = ["sync", "only_i64", "no_function", "no_float"] }

GopherJ commented 4 years ago

I believe no_function, only_i64, no_float have already removed many built in functions, but it still took a lot of time to create an instance.

GopherJ commented 4 years ago

Also can we remove the lifetime bound of Engine? It doesn't seem to be so necessary, maybe we can change FnSpec to stop using lifetime?

Removing the usage of that lifetime will make it easier to share a engine

schungx commented 4 years ago

Well, I made it such that FnSpec is cheap to clone as it is simply holds Cow<'e, str> for the function name.

If you take away the lifetime, you'll need to put a full String into FnSpec. That won't be too bad, except that when you get from a HashMap with FnSpec as key, you need to clone the string just to lookup the function + arguments. That's because HashMap cannot borrow an FnSpec with &str function name.

Alternatively, it might be possible to just skip the entire FnSpec altogether and just manually compute a hash for the function name + arguments. Use that as a key instead. This way, you can remove the lifetime from Engine.

But why is the lifetime stopping you from sharing the Engine? I suppose that the Engine should exist long enough for all shared access, so technically speaking the Engine lifetime should not be a problem to you...

GopherJ commented 4 years ago

@schungx It's not difficult when we put simply 'static but with customlized lifetime users will need to put lifetime in every single place.

schungx commented 4 years ago

Well looking at it some more, it actually seems to be beneficial to take the lifetime away because we avoid allocating a new Vec<_> during every single function call. Since Rhai calls a lot of functions, every behavior is a function, maybe this will help performance a bit.

I'll try it a bit to see if it makes any difference...

schungx commented 4 years ago

And, I really can see no reason why Engine::new_raw should be slow... it really isn't doing much except register a bunch of essential functions...

schungx commented 4 years ago

Alright, the new version with hashes as function keys (instead of FnSpec) seems to run a bit faster. How much we'd probably need to depend on your benchmarks to test...

Using hashes removes the lifetime from Engine as well.

You probably need to wait until I can push my changes here as the Internet connection here is slow...

GopherJ commented 4 years ago

@schungx No pb, great thanks for taking into account the lifetime issue

schungx commented 4 years ago

You can try out the latest master at https://github.com/schungx/rhai

Engine no longer has a lifetime (with the caveat that on_print and on_debug now take closures that are 'static). It's ok since, for the callbacks to do anything useful, you need to wrap the outside shared objects in Arc<Mutex> or Rc<RefCell> anyway.

Please rerun your benchmarks to see if it affects the speed.

GopherJ commented 4 years ago

@schungx I got this error:

error[E0412]: cannot find type `FLOAT` in this scope
   --> /home/cheng/.cargo/git/checkouts/rhai-81167890e2f542e4/3a93ab8/src/token.rs:142:19
    |
142 |     FloatConstant(FLOAT),
    |                   ^^^^^ not found in this scope
    |
help: possible candidate is found in another module, you can import it into scope
    |
3   | use crate::parser::FLOAT;
    |

error: aborting due to previous error

For more information about this error, try `rustc --explain E0412`.
error: could not compile `rhai`.

To learn more, run the command again with --verbose.

Cargo.toml

rhai = { git = "https://github.com/schungx/rhai", default-features = false, features = ["sync", "only_i64", "no_function", "no_float"] }

schungx commented 4 years ago

Sorry, forgot a few config attributes. I've pushed a fix. Try again.

GopherJ commented 4 years ago

@schungx I believe the benchmark on my side is nearly the same:

 name                                         before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                       4,932           5,241                   309    6.27%   x 0.94 
 b_benchmark_basic_model                      8,008           7,812                  -196   -2.45%   x 1.03 
 b_benchmark_cached_abac_model                223             220                      -3   -1.35%   x 1.01 
 b_benchmark_cached_key_match                 223             228                       5    2.24%   x 0.98 
 b_benchmark_cached_priority_model            217             222                       5    2.30%   x 0.98 
 b_benchmark_cached_rbac_model                237             227                     -10   -4.22%   x 1.04 
 b_benchmark_cached_rbac_model_large          199             220                      21   10.55%   x 0.90 
 b_benchmark_cached_rbac_model_medium         240             220                     -20   -8.33%   x 1.09 
 b_benchmark_cached_rbac_model_small          240             247                       7    2.92%   x 0.97 
 b_benchmark_cached_rbac_model_with_domains   262             277                      15    5.73%   x 0.95 
 b_benchmark_cached_rbac_with_deny            229             252                      23   10.04%   x 0.91 
 b_benchmark_cached_rbac_with_resource_roles  218             256                      38   17.43%   x 0.85 
 b_benchmark_key_match                        48,672          52,735                4,063    8.35%   x 0.92 
 b_benchmark_priority_model                   6,798           6,632                  -166   -2.44%   x 1.03 
 b_benchmark_raw                              5               5                         0    0.00%   x 1.00 
 b_benchmark_rbac_model                       18,596          18,333                 -263   -1.41%   x 1.01 
 b_benchmark_rbac_model_large                 47,775,846      44,447,575       -3,328,271   -6.97%   x 1.07 
 b_benchmark_rbac_model_medium                4,265,486       3,735,238          -530,248  -12.43%   x 1.14 
 b_benchmark_rbac_model_small                 409,947         376,999             -32,948   -8.04%   x 1.09 
 b_benchmark_rbac_model_with_domains          29,043          27,104               -1,939   -6.68%   x 1.07 
 b_benchmark_rbac_with_deny                   24,130          22,972               -1,158   -4.80%   x 1.05 
 b_benchmark_rbac_with_resource_roles         17,292          15,971               -1,321   -7.64%   x 1.08 
 b_benmark_cached_basic_model                 217             219                       2    0.92%   x 0.99

when there are many evaluations, the bench does improve a little bit as you can see:

 b_benchmark_rbac_model_large                 47,775,846      44,447,575       -3,328,271   -6.97%   x 1.07 
 b_benchmark_rbac_model_medium                4,265,486       3,735,238          -530,248  -12.43%   x 1.14 
 b_benchmark_rbac_model_small                 409,947         376,999             -32,948   -8.04%   x 1.09

schungx commented 4 years ago

This helps primarily when the script is calling a lot of functions (i.e. using a lot of operators other than && and ||). Other code paths are basically unchanged, so they should perform the same.

b_benchmark_key_match is an interesting one... I wonder what is causing the regression.

schungx commented 4 years ago

@GopherJ If you'd be so kind to rerun your benchmarks with my latest master...

I reduced the size of Dynamic significantly, and it seems to run significantly faster, in some cases 30% or up.

GopherJ commented 4 years ago

@schungx I'll do it now!

GopherJ commented 4 years ago

@schungx seems on my side the bench doesn't change so much ~

 name                                         before.bak ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                       5,126               5,359                   233   4.55%   x 0.96 
 b_benchmark_basic_model                      8,585               8,345                  -240  -2.80%   x 1.03 
 b_benchmark_cached_abac_model                246                 244                      -2  -0.81%   x 1.01 
 b_benchmark_cached_key_match                 266                 253                     -13  -4.89%   x 1.05 
 b_benchmark_cached_priority_model            267                 248                     -19  -7.12%   x 1.08 
 b_benchmark_cached_rbac_model                263                 244                     -19  -7.22%   x 1.08 
 b_benchmark_cached_rbac_model_large          250                 248                      -2  -0.80%   x 1.01 
 b_benchmark_cached_rbac_model_medium         249                 236                     -13  -5.22%   x 1.06 
 b_benchmark_cached_rbac_model_small          248                 247                      -1  -0.40%   x 1.00 
 b_benchmark_cached_rbac_model_with_domains   275                 273                      -2  -0.73%   x 1.01 
 b_benchmark_cached_rbac_with_deny            248                 245                      -3  -1.21%   x 1.01 
 b_benchmark_cached_rbac_with_resource_roles  249                 245                      -4  -1.61%   x 1.02 
 b_benchmark_key_match                        56,673              55,875                 -798  -1.41%   x 1.01 
 b_benchmark_priority_model                   8,093               7,657                  -436  -5.39%   x 1.06 
 b_benchmark_raw                              6                   6                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                       21,961              21,473                 -488  -2.22%   x 1.02 
 b_benchmark_rbac_model_large                 46,799,182          44,820,064       -1,979,118  -4.23%   x 1.04 
 b_benchmark_rbac_model_medium                4,529,010           4,330,489          -198,521  -4.38%   x 1.05 
 b_benchmark_rbac_model_small                 459,244             428,492             -30,752  -6.70%   x 1.07 
 b_benchmark_rbac_model_with_domains          32,884              31,937                 -947  -2.88%   x 1.03 
 b_benchmark_rbac_with_deny                   26,824              26,413                 -411  -1.53%   x 1.02 
 b_benchmark_rbac_with_resource_roles         20,028              18,913               -1,115  -5.57%   x 1.06 
 b_benmark_cached_basic_model                 253                 245                      -8  -3.16%   x 1.03

maybe it's related to the way we use rhai, we just use few functions and basic evaluations like:

g(r.sub, p.sub) && r.obj == p.obj && r.act == p.act || r.obj in ["data2", "data3"]
r.sub == p.sub && keyMatch2(r.obj, p.obj) && regexMatch(r.act, p.act)
...

GopherJ commented 4 years ago

but I think most bench has improved from the statistics, really thanks for your efforts

schungx commented 4 years ago

I suggest you try it with only_i32 instead of only_i64 unless you really need 64-bit integers.

i32 makes Dynamic only 8 bytes in size, while i64 makes it 16 bytes, double the size.

So in general a 4-5% speed improvement... not bad, but nothing to brag about I guess!

GopherJ commented 4 years ago

@schungx Thanks I'll try it! Also maybe you can use flamegraph to test rhai so that you know exactly what's consuming time.

Here an example that I used: https://github.com/casbin-rs/casbin-flamegraph, in rhai you can do something similar, writing some benchemarks and make a simple binary which just run specific bench code but just one time.

Then generate a flamegraph.

schungx commented 4 years ago

Unfortunately I'm on Windows and dtrace doesn't seem to work here yet... so no flamegraph...

schungx commented 4 years ago

Found the culprit. Removing engine.register_core_lib() from Engine::new_raw makes it run really fast.

But you we need the core lib... otherwise you don't have arithmetic and logical operators etc....

One way is to use lazy_static and do it only once, but then I'll have to pull in the lazy_static crate. Not sure if it'll be worth it...

schungx commented 4 years ago

There are minimum 60+ essential functions registered as the "core lib" during every creation of an Engine. Since these really are never changed, it is duplicated work for every Engine creation.

I have an idea to split the libraries into "packages", which can be created once and then passed into Engine::new_raw. Since these packages never change, they can be immutable and shared around globally. Then we avoid the creation cost of an Engine.

schungx commented 4 years ago

@GopherJ I have a new branch that splits all the built-in libraries into "packages" that you can mix-and-match at will. The packages are basically immutable (they are behind an Rc or Arc), and you can simply pass them around to new engines.

Engine::new_raw() now takes almost no time at all as it doesn't register any functions. You load additional packages with load_package method.

Package loading also takes no time at all. It is the package creation that takes time, and you only do it once and keep it around.

If you're interested, surf to https://github.com/schungx/rhai/tree/packages and try it out.

GopherJ commented 4 years ago

@schungx It seems cool and it's more clean now! Thanks for your great work

schungx commented 4 years ago

I can leave it at "package" level, or I can further divide things up into individual functions. For example, in addition to registering a package, you can also register a single function within it. Say you only need + and - etc. without multiplication.

Do you think such fine-grained control is beneficial? Because you can always just copy the code and do a register_fn yourself...

schungx commented 4 years ago

I'll close this now since PR https://github.com/jonathandturner/rhai/pull/144 adds the packages API and now Engine::new_raw only takes nanoseconds.

rhaiscript / rhai

Engine::new, Engine::new_raw bench is slow #142