Closed GopherJ closed 4 years ago
Cargo.toml
rhai = { version = "0.12.0", default-features = false, features = ["sync", "only_i64", "no_function", "no_float"] }
I believe no_function
, only_i64
, no_float
have already removed many built in functions, but it still took a lot of time to create an instance.
Also can we remove the lifetime bound of Engine
? It doesn't seem to be so necessary, maybe we can change FnSpec
to stop using lifetime?
Removing the usage of that lifetime will make it easier to share a engine
Well, I made it such that FnSpec
is cheap to clone as it is simply holds Cow<'e, str>
for the function name.
If you take away the lifetime, you'll need to put a full String
into FnSpec
. That won't be too bad, except that when you get
from a HashMap
with FnSpec
as key, you need to clone the string just to lookup the function + arguments. That's because HashMap
cannot borrow an FnSpec
with &str
function name.
Alternatively, it might be possible to just skip the entire FnSpec
altogether and just manually compute a hash for the function name + arguments. Use that as a key instead. This way, you can remove the lifetime from Engine
.
But why is the lifetime stopping you from sharing the Engine? I suppose that the Engine should exist long enough for all shared access, so technically speaking the Engine lifetime should not be a problem to you...
@schungx It's not difficult when we put simply 'static
but with customlized lifetime users will need to put lifetime in every single place.
Well looking at it some more, it actually seems to be beneficial to take the lifetime away because we avoid allocating a new Vec<_>
during every single function call. Since Rhai calls a lot of functions, every behavior is a function, maybe this will help performance a bit.
I'll try it a bit to see if it makes any difference...
And, I really can see no reason why Engine::new_raw
should be slow... it really isn't doing much except register a bunch of essential functions...
Alright, the new version with hashes as function keys (instead of FnSpec
) seems to run a bit faster. How much we'd probably need to depend on your benchmarks to test...
Using hashes removes the lifetime from Engine
as well.
You probably need to wait until I can push my changes here as the Internet connection here is slow...
@schungx No pb, great thanks for taking into account the lifetime issue
You can try out the latest master at https://github.com/schungx/rhai
Engine
no longer has a lifetime (with the caveat that on_print
and on_debug
now take closures that are 'static
). It's ok since, for the callbacks to do anything useful, you need to wrap the outside shared objects in Arc<Mutex>
or Rc<RefCell>
anyway.
Please rerun your benchmarks to see if it affects the speed.
@schungx I got this error:
error[E0412]: cannot find type `FLOAT` in this scope
--> /home/cheng/.cargo/git/checkouts/rhai-81167890e2f542e4/3a93ab8/src/token.rs:142:19
|
142 | FloatConstant(FLOAT),
| ^^^^^ not found in this scope
|
help: possible candidate is found in another module, you can import it into scope
|
3 | use crate::parser::FLOAT;
|
error: aborting due to previous error
For more information about this error, try `rustc --explain E0412`.
error: could not compile `rhai`.
To learn more, run the command again with --verbose.
Cargo.toml
rhai = { git = "https://github.com/schungx/rhai", default-features = false, features = ["sync", "only_i64", "no_function", "no_float"] }
Sorry, forgot a few config attributes. I've pushed a fix. Try again.
@schungx I believe the benchmark on my side is nearly the same:
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,932 5,241 309 6.27% x 0.94
b_benchmark_basic_model 8,008 7,812 -196 -2.45% x 1.03
b_benchmark_cached_abac_model 223 220 -3 -1.35% x 1.01
b_benchmark_cached_key_match 223 228 5 2.24% x 0.98
b_benchmark_cached_priority_model 217 222 5 2.30% x 0.98
b_benchmark_cached_rbac_model 237 227 -10 -4.22% x 1.04
b_benchmark_cached_rbac_model_large 199 220 21 10.55% x 0.90
b_benchmark_cached_rbac_model_medium 240 220 -20 -8.33% x 1.09
b_benchmark_cached_rbac_model_small 240 247 7 2.92% x 0.97
b_benchmark_cached_rbac_model_with_domains 262 277 15 5.73% x 0.95
b_benchmark_cached_rbac_with_deny 229 252 23 10.04% x 0.91
b_benchmark_cached_rbac_with_resource_roles 218 256 38 17.43% x 0.85
b_benchmark_key_match 48,672 52,735 4,063 8.35% x 0.92
b_benchmark_priority_model 6,798 6,632 -166 -2.44% x 1.03
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 18,596 18,333 -263 -1.41% x 1.01
b_benchmark_rbac_model_large 47,775,846 44,447,575 -3,328,271 -6.97% x 1.07
b_benchmark_rbac_model_medium 4,265,486 3,735,238 -530,248 -12.43% x 1.14
b_benchmark_rbac_model_small 409,947 376,999 -32,948 -8.04% x 1.09
b_benchmark_rbac_model_with_domains 29,043 27,104 -1,939 -6.68% x 1.07
b_benchmark_rbac_with_deny 24,130 22,972 -1,158 -4.80% x 1.05
b_benchmark_rbac_with_resource_roles 17,292 15,971 -1,321 -7.64% x 1.08
b_benmark_cached_basic_model 217 219 2 0.92% x 0.99
when there are many evaluations, the bench does improve a little bit as you can see:
b_benchmark_rbac_model_large 47,775,846 44,447,575 -3,328,271 -6.97% x 1.07
b_benchmark_rbac_model_medium 4,265,486 3,735,238 -530,248 -12.43% x 1.14
b_benchmark_rbac_model_small 409,947 376,999 -32,948 -8.04% x 1.09
This helps primarily when the script is calling a lot of functions (i.e. using a lot of operators other than &&
and ||
). Other code paths are basically unchanged, so they should perform the same.
b_benchmark_key_match
is an interesting one... I wonder what is causing the regression.
@GopherJ If you'd be so kind to rerun your benchmarks with my latest master...
I reduced the size of Dynamic
significantly, and it seems to run significantly faster, in some cases 30% or up.
@schungx I'll do it now!
@schungx seems on my side the bench doesn't change so much ~
name before.bak ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 5,126 5,359 233 4.55% x 0.96
b_benchmark_basic_model 8,585 8,345 -240 -2.80% x 1.03
b_benchmark_cached_abac_model 246 244 -2 -0.81% x 1.01
b_benchmark_cached_key_match 266 253 -13 -4.89% x 1.05
b_benchmark_cached_priority_model 267 248 -19 -7.12% x 1.08
b_benchmark_cached_rbac_model 263 244 -19 -7.22% x 1.08
b_benchmark_cached_rbac_model_large 250 248 -2 -0.80% x 1.01
b_benchmark_cached_rbac_model_medium 249 236 -13 -5.22% x 1.06
b_benchmark_cached_rbac_model_small 248 247 -1 -0.40% x 1.00
b_benchmark_cached_rbac_model_with_domains 275 273 -2 -0.73% x 1.01
b_benchmark_cached_rbac_with_deny 248 245 -3 -1.21% x 1.01
b_benchmark_cached_rbac_with_resource_roles 249 245 -4 -1.61% x 1.02
b_benchmark_key_match 56,673 55,875 -798 -1.41% x 1.01
b_benchmark_priority_model 8,093 7,657 -436 -5.39% x 1.06
b_benchmark_raw 6 6 0 0.00% x 1.00
b_benchmark_rbac_model 21,961 21,473 -488 -2.22% x 1.02
b_benchmark_rbac_model_large 46,799,182 44,820,064 -1,979,118 -4.23% x 1.04
b_benchmark_rbac_model_medium 4,529,010 4,330,489 -198,521 -4.38% x 1.05
b_benchmark_rbac_model_small 459,244 428,492 -30,752 -6.70% x 1.07
b_benchmark_rbac_model_with_domains 32,884 31,937 -947 -2.88% x 1.03
b_benchmark_rbac_with_deny 26,824 26,413 -411 -1.53% x 1.02
b_benchmark_rbac_with_resource_roles 20,028 18,913 -1,115 -5.57% x 1.06
b_benmark_cached_basic_model 253 245 -8 -3.16% x 1.03
maybe it's related to the way we use rhai
, we just use few functions and basic evaluations like:
g(r.sub, p.sub) && r.obj == p.obj && r.act == p.act || r.obj in ["data2", "data3"]
r.sub == p.sub && keyMatch2(r.obj, p.obj) && regexMatch(r.act, p.act)
...
but I think most bench has improved from the statistics, really thanks for your efforts
I suggest you try it with only_i32
instead of only_i64
unless you really need 64-bit integers.
i32
makes Dynamic
only 8 bytes in size, while i64
makes it 16 bytes, double the size.
So in general a 4-5% speed improvement... not bad, but nothing to brag about I guess!
@schungx Thanks I'll try it! Also maybe you can use flamegraph to test rhai so that you know exactly what's consuming time.
Here an example that I used: https://github.com/casbin-rs/casbin-flamegraph, in rhai you can do something similar, writing some benchemarks and make a simple binary which just run specific bench code but just one time.
Then generate a flamegraph.
Unfortunately I'm on Windows and dtrace
doesn't seem to work here yet... so no flamegraph
...
Found the culprit. Removing engine.register_core_lib()
from Engine::new_raw
makes it run really fast.
But you we need the core lib... otherwise you don't have arithmetic and logical operators etc....
One way is to use lazy_static
and do it only once, but then I'll have to pull in the lazy_static
crate. Not sure if it'll be worth it...
There are minimum 60+ essential functions registered as the "core lib" during every creation of an Engine
. Since these really are never changed, it is duplicated work for every Engine
creation.
I have an idea to split the libraries into "packages", which can be created once and then passed into Engine::new_raw
. Since these packages never change, they can be immutable and shared around globally. Then we avoid the creation cost of an Engine
.
@GopherJ I have a new branch that splits all the built-in libraries into "packages" that you can mix-and-match at will. The packages are basically immutable (they are behind an Rc
or Arc
), and you can simply pass them around to new engines.
Engine::new_raw()
now takes almost no time at all as it doesn't register any functions. You load additional packages with load_package
method.
Package loading also takes no time at all. It is the package creation that takes time, and you only do it once and keep it around.
If you're interested, surf to https://github.com/schungx/rhai/tree/packages and try it out.
@schungx It seems cool and it's more clean now! Thanks for your great work
I can leave it at "package" level, or I can further divide things up into individual functions. For example, in addition to registering a package, you can also register a single function within it. Say you only need +
and -
etc. without multiplication.
Do you think such fine-grained control is beneficial? Because you can always just copy the code and do a register_fn
yourself...
I'll close this now since PR https://github.com/jonathandturner/rhai/pull/144 adds the packages API and now Engine::new_raw
only takes nanoseconds.
Currently no matter if we are using
Engine::new_raw
orEngine::new
, the creation is always slow even when I have deactivated many features.The result is from a laptop
i7-10gen