Closed GopherJ closed 4 years ago
Hhhhhmmmm.... that is quite strange. There are a lot fewer allocations in 14.1, and I don't see regression in my own benchmarks...
There are new functionalities added, though. For example, the engine now tracks the number of operations performed, allowing an upper limit, and allowing the user to track progress and/or terminate a runaway script. Needless to say, a counter is incremented during every visit of a tree node, and compare against the maximum limit.
This may not be what you need. Did you compile with unchecked
feature? That feature will skip the operations limit.
There is one more place I changed. I introduced an extra level of indirection to native function calls, making them a trait object instead of a closure. This means an extra indirection during function calls. Normally, this shouldn't impose too much of a speed penalty, because modern CPU's are designed to jump to locations with indirection.
But since I'm not using this feature for the moment - there is only one native Rust function interface - I'll back it out and let you try it again.
EDIT: Pls pull latest from https://github.com/schungx/rhai and run benches again. This one has no trait object indirection for function calls.
Yes your master seems better:
before: 0.13.0 after: schungx/master
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 6,408 6,333 -75 -1.17% x 1.01
b_benchmark_basic_model 7,201 6,054 -1,147 -15.93% x 1.19
b_benchmark_key_match 25,328 19,787 -5,541 -21.88% x 1.28
b_benchmark_priority_model 7,776 7,510 -266 -3.42% x 1.04
b_benchmark_raw 7 5 -2 -28.57% x 1.40
b_benchmark_rbac_model 18,992 18,972 -20 -0.11% x 1.00
b_benchmark_rbac_model_large 56,901,480 58,092,749 1,191,269 2.09% x 0.98
b_benchmark_rbac_model_medium 5,514,162 5,685,321 171,159 3.10% x 0.97
b_benchmark_rbac_model_small 562,147 575,012 12,865 2.29% x 0.98
b_benchmark_rbac_model_with_domains 11,124 10,384 -740 -6.65% x 1.07
b_benchmark_rbac_with_deny 33,161 32,474 -687 -2.07% x 1.02
b_benchmark_rbac_with_resource_roles 8,789 7,790 -999 -11.37% x 1.13
and there is no difference with or without unchecked
feature
and there is no difference with or without
unchecked
feature
Great... That means that the operations counting code is not affecting much...
BTW, I doubt that an extra level of indirection has caused this serious regression. Can you try 0.14.1
again?
@schungx
Emmmm....seems that you are right, I got totally different result:
before: 0.13.0 after: 0.14.0
➜ casbin-rs git:(master) ✗ cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 6,408 6,196 -212 -3.31% x 1.03
b_benchmark_basic_model 7,201 6,180 -1,021 -14.18% x 1.17
b_benchmark_key_match 25,328 19,202 -6,126 -24.19% x 1.32
b_benchmark_priority_model 7,776 7,374 -402 -5.17% x 1.05
b_benchmark_raw 7 5 -2 -28.57% x 1.40
b_benchmark_rbac_model 18,992 19,576 584 3.07% x 0.97
b_benchmark_rbac_model_large 56,901,480 59,267,623 2,366,143 4.16% x 0.96
b_benchmark_rbac_model_medium 5,514,162 5,887,555 373,393 6.77% x 0.94
b_benchmark_rbac_model_small 562,147 601,608 39,461 7.02% x 0.93
b_benchmark_rbac_model_with_domains 11,124 10,845 -279 -2.51% x 1.03
b_benchmark_rbac_with_deny 33,161 33,930 769 2.32% x 0.98
b_benchmark_rbac_with_resource_roles 8,789 8,027 -762 -8.67% x 1.09
However if I delete all these baselines and run again. I can still see the same regression:
before: 0.13.0 after: 0.14.0
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,283 6,237 1,954 45.62% x 0.69
b_benchmark_basic_model 4,451 6,792 2,341 52.59% x 0.66
b_benchmark_key_match 16,749 20,124 3,375 20.15% x 0.83
b_benchmark_priority_model 5,215 7,562 2,347 45.00% x 0.69
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 13,073 20,034 6,961 53.25% x 0.65
b_benchmark_rbac_model_large 39,322,181 59,356,943 20,034,762 50.95% x 0.66
b_benchmark_rbac_model_medium 3,865,921 5,821,866 1,955,945 50.59% x 0.66
b_benchmark_rbac_model_small 397,959 584,844 186,885 46.96% x 0.68
b_benchmark_rbac_model_with_domains 8,053 10,836 2,783 34.56% x 0.74
b_benchmark_rbac_with_deny 23,274 32,538 9,264 39.80% x 0.72
b_benchmark_rbac_with_resource_roles 6,377 7,869 1,492 23.40% x 0.81
If you have time to do it here the steps:
cargo install cargo-benchcmp
git clone https://github.com/casbin/casbin-rs
cd casbin-rs
cargo bench > before
# change rhai version
cargo bench > after
cargo benchcmp before after
Ok another run, still similar result so I think the regression exists:
before: 0.13.0 after: 0.14.0
➜ casbin-rs git:(master) cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,232 6,139 1,907 45.06% x 0.69
b_benchmark_basic_model 4,556 6,041 1,485 32.59% x 0.75
b_benchmark_key_match 16,252 19,205 2,953 18.17% x 0.85
b_benchmark_priority_model 5,337 7,297 1,960 36.72% x 0.73
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 13,513 22,093 8,580 63.49% x 0.61
b_benchmark_rbac_model_large 44,999,900 61,594,631 16,594,731 36.88% x 0.73
b_benchmark_rbac_model_medium 4,420,624 5,946,035 1,525,411 34.51% x 0.74
b_benchmark_rbac_model_small 441,698 625,007 183,309 41.50% x 0.71
b_benchmark_rbac_model_with_domains 9,059 10,465 1,406 15.52% x 0.87
b_benchmark_rbac_with_deny 26,733 33,326 6,593 24.66% x 0.80
b_benchmark_rbac_with_resource_roles 7,122 8,221 1,099 15.43% x 0.87
When I delete all baselines and run again for 0.13.0
and schungx/master
, I see again different result:
before: 0.13.0 after: schungx/master
➜ casbin-rs git:(master) ✗ cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,249 6,207 1,958 46.08% x 0.68
b_benchmark_basic_model 4,535 7,062 2,527 55.72% x 0.64
b_benchmark_key_match 16,568 22,239 5,671 34.23% x 0.74
b_benchmark_priority_model 5,352 8,425 3,073 57.42% x 0.64
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 12,895 19,167 6,272 48.64% x 0.67
b_benchmark_rbac_model_large 48,338,653 59,823,678 11,485,025 23.76% x 0.81
b_benchmark_rbac_model_medium 4,798,350 6,329,467 1,531,117 31.91% x 0.76
b_benchmark_rbac_model_small 399,847 594,940 195,093 48.79% x 0.67
b_benchmark_rbac_model_with_domains 8,720 10,571 1,851 21.23% x 0.82
b_benchmark_rbac_with_deny 23,933 33,556 9,623 40.21% x 0.71
b_benchmark_rbac_with_resource_roles 6,226 7,948 1,722 27.66% x 0.78
casbin-rs normally use only operators like ==, >, <, >=, <=, "alice" in ["alice", "bob"], &&, ||, (...expressions...), object_map.age ... etc
and also functions defined in rust
.
However the same calculation needs to be run on every policy and in our scenario we can have 1000k policies, so the calculation can be quite heavy. That's why I'm also finding a way to use multiple cores if rhai@0.14.1
has added Clone impl
for Scope
.
I think the undetected regression means that we need to adjust rhai's benchmarks on similar scenario
One thing that baffles me is that the numbers are all over the place. For example, the same benchmark for 0.13.0 can go from 6xxx to 4xxx, while for 0.14.0 it can go from 9xxx to 6xxx. This casts some doubts as to the reliability of these benchmarks... at least we'll need to run them under identical conditions.
All we can say right now is that the upper bound of 0.13.0 seems to be similar to the lower bound of 0.14.1. So it seems that, yes, there appears to be regression that we need to get a fix on.
I'd suggest you use eval_expression
if you're ever only going to need expressions, not statements, and the no_function
feature if your user are never gonna define any functions. However, I don't think it would make it faster, only less likely for your user to do something whacky.
has added
Clone impl
forScope
.
I believe Scope
is already Clone
...
Hi @schungx I think we can trust the latest two benchmarks because I have kept the computer under the same condition and ran them one after another.
We are already using no_function
because we don't want to have security issues:
rhai = { version = "0.13.0", default-features = false, features = ["sync", "only_i32", "no_function", "no_float"] }
I deleted again baselines data:
before: 0.13.0 after: schungx/master
➜ casbin-rs git:(master) cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,235 6,221 1,986 46.89% x 0.68
b_benchmark_basic_model 4,507 6,070 1,563 34.68% x 0.74
b_benchmark_key_match 16,981 18,627 1,646 9.69% x 0.91
b_benchmark_priority_model 5,409 7,474 2,065 38.18% x 0.72
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 13,447 20,035 6,588 48.99% x 0.67
b_benchmark_rbac_model_large 39,401,991 59,560,583 20,158,592 51.16% x 0.66
b_benchmark_rbac_model_medium 3,929,059 6,067,630 2,138,571 54.43% x 0.65
b_benchmark_rbac_model_small 388,717 599,956 211,239 54.34% x 0.65
b_benchmark_rbac_model_with_domains 7,907 10,386 2,479 31.35% x 0.76
b_benchmark_rbac_with_deny 22,502 33,192 10,690 47.51% x 0.68
b_benchmark_rbac_with_resource_roles 6,075 8,863 2,788 45.89% x 0.69
OK, we can say that's a pretty serious regression. I'll need to look into it... Can you make sure again unchecked
is not helping?
@schungx
Here the result:
before: schungx/master without unchecked
after: schungx/master with unchecked
➜ casbin-rs git:(master) ✗ cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 6,229 6,359 130 2.09% x 0.98
b_benchmark_basic_model 6,122 6,223 101 1.65% x 0.98
b_benchmark_key_match 19,123 19,820 697 3.64% x 0.96
b_benchmark_priority_model 7,243 7,679 436 6.02% x 0.94
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 18,492 19,414 922 4.99% x 0.95
b_benchmark_rbac_model_large 64,651,625 58,656,739 -5,994,886 -9.27% x 1.10
b_benchmark_rbac_model_medium 5,872,116 5,726,679 -145,437 -2.48% x 1.03
b_benchmark_rbac_model_small 579,152 587,740 8,588 1.48% x 0.99
b_benchmark_rbac_model_with_domains 10,521 10,403 -118 -1.12% x 1.01
b_benchmark_rbac_with_deny 35,901 33,373 -2,528 -7.04% x 1.08
b_benchmark_rbac_with_resource_roles 8,143 8,088 -55 -0.68% x 1.01
It has been slightly better in some cases but not much, I consider they'll have the same result because even if we run the same benchmark under the same condition, we can still slightly different result
From:
b_benchmark_rbac_model_large 64,651,625 58,656,739 -5,994,886 -9.27% x 1.10
we can see it'll help when there are heavy calculation.
Large in casbin-rs
's benchmark means:
10k policies
100k users
we run Engine::eval_with_scope
for every policy, so at most 10k
times Engine::eval_with_scope
we can see it'll help when there are heavy calculation.
Yes, that's because during each visit of an AST node I increment a counter and check whether it exceeds the limit. So unchecked
probably saves 5-10% off a heavy script. Not bad.
@GopherJ can you pull from the latest of https://github.com/schungx/rhai ?
I discovered I accidentally used two layers of Arc
when one layer is more than enough. So it was one wasted redirection during every function call.
@schungx Hi sure,
Here the result:
before: v0.13.0 after: schungx/master
➜ casbin-rs git:(master) cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,276 6,176 1,900 44.43% x 0.69
b_benchmark_basic_model 4,545 6,003 1,458 32.08% x 0.76
b_benchmark_key_match 16,434 18,763 2,329 14.17% x 0.88
b_benchmark_priority_model 5,251 7,181 1,930 36.75% x 0.73
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 13,943 19,274 5,331 38.23% x 0.72
b_benchmark_rbac_model_large 40,734,426 59,363,478 18,629,052 45.73% x 0.69
b_benchmark_rbac_model_medium 3,839,934 5,770,265 1,930,331 50.27% x 0.67
b_benchmark_rbac_model_small 397,351 583,291 185,940 46.79% x 0.68
b_benchmark_rbac_model_with_domains 8,198 10,153 1,955 23.85% x 0.81
b_benchmark_rbac_with_deny 23,502 32,733 9,231 39.28% x 0.72
b_benchmark_rbac_with_resource_roles 6,283 7,872 1,589 25.29% x 0.80
I think it didn't improve so much so maybe this is not the source of problem
Hhhmmm... baffling. In my benchmarks it improved quite a bit.
I've cloned your repo and now I think I got it. There is another change in 0.14.1.
Previously, all the eval_XXX
methods do not run the script optimizer, because I figured that since you're parsing a script and immediately running it, you're probably not keeping it around for another run anyway. So there seemed no point in optimizing it.
However, I find that, in real life, we do need the optimizer when running one-use scripts, especially when those scripts are machine-generated from templates and they are long-running.
The reason why 0.14.1 runs much slower for you (but not for my benchmarks) is because, during every eval
, the script runs through an optimizer, which is completely useless because it'll just spit the AST back out unchanged while doing a heap of work.
So, the short answer is:
Option 1: add the no_optimize
feature
Option 2: engine.set_optimization_level(OptimizationLevel::None)
@schungx Yes you are right, I tested with no_optimize
and the bench seems to come back
➜ casbin-rs git:(master) cargo benchcmp before after
name before ns/iter after ns/iter diff ns/iter diff % speedup
b_benchmark_abac_model 4,218 4,749 531 12.59% x 0.89
b_benchmark_basic_model 4,529 4,576 47 1.04% x 0.99
b_benchmark_key_match 16,454 16,406 -48 -0.29% x 1.00
b_benchmark_priority_model 5,267 5,393 126 2.39% x 0.98
b_benchmark_raw 5 5 0 0.00% x 1.00
b_benchmark_rbac_model 13,210 13,790 580 4.39% x 0.96
b_benchmark_rbac_model_large 40,392,526 41,128,598 736,072 1.82% x 0.98
b_benchmark_rbac_model_medium 3,878,948 4,198,179 319,231 8.23% x 0.92
b_benchmark_rbac_model_small 389,812 417,297 27,485 7.05% x 0.93
b_benchmark_rbac_model_with_domains 7,858 8,046 188 2.39% x 0.98
b_benchmark_rbac_with_deny 22,681 24,006 1,325 5.84% x 0.94
b_benchmark_rbac_with_resource_roles 6,056 6,050 -6 -0.10% x 1.00
@schungx Thanks for your help, also I think the benchmarks in rhai need to add more scenarios, I'm not sure if:
casbin-rs normally use only operators like ==, >, <, >=, <=, "alice" in ["alice", "bob"], &&, ||, (...expressions...), object_map.age ... etc and also functions defined in rust.
has been covered
My own benches always parse the expressions once and then bench the AST evaluations. So they nicely skip your scenario.
Come to think of it, I probably need to add a few eval_with_scope
or eval_expression_with_scope
benchmarks...
@schungx Yes I think more benchmarks need to be added because it clearly shows that the current bench cannot cover real scenarios like the case in casbin-rs
.
Here the latest bench diff if you would like to have a look:
Great thanks on your quick support!
before -> 13.0 after -> 14.1