bench regression 13.0 vs 14.1

GopherJ commented 4 years ago

before -> 13.0 after -> 14.1

 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                6,408           9,470                 3,062  47.78%   x 0.68 
 b_benchmark_basic_model               7,201           9,421                 2,220  30.83%   x 0.76 
 b_benchmark_key_match                 25,328          31,897                6,569  25.94%   x 0.79 
 b_benchmark_priority_model            7,776           12,074                4,298  55.27%   x 0.64 
 b_benchmark_raw                       7               8                         1  14.29%   x 0.88 
 b_benchmark_rbac_model                18,992          29,028               10,036  52.84%   x 0.65 
 b_benchmark_rbac_model_large          56,901,480      83,206,148       26,304,668  46.23%   x 0.68 
 b_benchmark_rbac_model_medium         5,514,162       8,043,472         2,529,310  45.87%   x 0.69 
 b_benchmark_rbac_model_small          562,147         816,090             253,943  45.17%   x 0.69 
 b_benchmark_rbac_model_with_domains   11,124          20,845                9,721  87.39%   x 0.53 
 b_benchmark_rbac_with_deny            33,161          47,960               14,799  44.63%   x 0.69 
 b_benchmark_rbac_with_resource_roles  8,789           12,374                3,585  40.79%   x 0.71

schungx commented 4 years ago

Hhhhhmmmm.... that is quite strange. There are a lot fewer allocations in 14.1, and I don't see regression in my own benchmarks...

There are new functionalities added, though. For example, the engine now tracks the number of operations performed, allowing an upper limit, and allowing the user to track progress and/or terminate a runaway script. Needless to say, a counter is incremented during every visit of a tree node, and compare against the maximum limit.

This may not be what you need. Did you compile with unchecked feature? That feature will skip the operations limit.

schungx commented 4 years ago

There is one more place I changed. I introduced an extra level of indirection to native function calls, making them a trait object instead of a closure. This means an extra indirection during function calls. Normally, this shouldn't impose too much of a speed penalty, because modern CPU's are designed to jump to locations with indirection.

But since I'm not using this feature for the moment - there is only one native Rust function interface - I'll back it out and let you try it again.

EDIT: Pls pull latest from https://github.com/schungx/rhai and run benches again. This one has no trait object indirection for function calls.

GopherJ commented 4 years ago

Yes your master seems better:

before: 0.13.0 after: schungx/master

 name                                  before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                6,408           6,333                   -75   -1.17%   x 1.01 
 b_benchmark_basic_model               7,201           6,054                -1,147  -15.93%   x 1.19 
 b_benchmark_key_match                 25,328          19,787               -5,541  -21.88%   x 1.28 
 b_benchmark_priority_model            7,776           7,510                  -266   -3.42%   x 1.04 
 b_benchmark_raw                       7               5                        -2  -28.57%   x 1.40 
 b_benchmark_rbac_model                18,992          18,972                  -20   -0.11%   x 1.00 
 b_benchmark_rbac_model_large          56,901,480      58,092,749        1,191,269    2.09%   x 0.98 
 b_benchmark_rbac_model_medium         5,514,162       5,685,321           171,159    3.10%   x 0.97 
 b_benchmark_rbac_model_small          562,147         575,012              12,865    2.29%   x 0.98 
 b_benchmark_rbac_model_with_domains   11,124          10,384                 -740   -6.65%   x 1.07 
 b_benchmark_rbac_with_deny            33,161          32,474                 -687   -2.07%   x 1.02 
 b_benchmark_rbac_with_resource_roles  8,789           7,790                  -999  -11.37%   x 1.13

and there is no difference with or without unchecked feature

schungx commented 4 years ago

and there is no difference with or without unchecked feature

Great... That means that the operations counting code is not affecting much...

BTW, I doubt that an extra level of indirection has caused this serious regression. Can you try 0.14.1 again?

GopherJ commented 4 years ago

@schungx

Emmmm....seems that you are right, I got totally different result:

before: 0.13.0 after: 0.14.0

➜  casbin-rs git:(master) ✗ cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                6,408           6,196                  -212   -3.31%   x 1.03 
 b_benchmark_basic_model               7,201           6,180                -1,021  -14.18%   x 1.17 
 b_benchmark_key_match                 25,328          19,202               -6,126  -24.19%   x 1.32 
 b_benchmark_priority_model            7,776           7,374                  -402   -5.17%   x 1.05 
 b_benchmark_raw                       7               5                        -2  -28.57%   x 1.40 
 b_benchmark_rbac_model                18,992          19,576                  584    3.07%   x 0.97 
 b_benchmark_rbac_model_large          56,901,480      59,267,623        2,366,143    4.16%   x 0.96 
 b_benchmark_rbac_model_medium         5,514,162       5,887,555           373,393    6.77%   x 0.94 
 b_benchmark_rbac_model_small          562,147         601,608              39,461    7.02%   x 0.93 
 b_benchmark_rbac_model_with_domains   11,124          10,845                 -279   -2.51%   x 1.03 
 b_benchmark_rbac_with_deny            33,161          33,930                  769    2.32%   x 0.98 
 b_benchmark_rbac_with_resource_roles  8,789           8,027                  -762   -8.67%   x 1.09

GopherJ commented 4 years ago

However if I delete all these baselines and run again. I can still see the same regression:

before: 0.13.0 after: 0.14.0

 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,283           6,237                 1,954  45.62%   x 0.69 
 b_benchmark_basic_model               4,451           6,792                 2,341  52.59%   x 0.66 
 b_benchmark_key_match                 16,749          20,124                3,375  20.15%   x 0.83 
 b_benchmark_priority_model            5,215           7,562                 2,347  45.00%   x 0.69 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                13,073          20,034                6,961  53.25%   x 0.65 
 b_benchmark_rbac_model_large          39,322,181      59,356,943       20,034,762  50.95%   x 0.66 
 b_benchmark_rbac_model_medium         3,865,921       5,821,866         1,955,945  50.59%   x 0.66 
 b_benchmark_rbac_model_small          397,959         584,844             186,885  46.96%   x 0.68 
 b_benchmark_rbac_model_with_domains   8,053           10,836                2,783  34.56%   x 0.74 
 b_benchmark_rbac_with_deny            23,274          32,538                9,264  39.80%   x 0.72 
 b_benchmark_rbac_with_resource_roles  6,377           7,869                 1,492  23.40%   x 0.81

GopherJ commented 4 years ago

If you have time to do it here the steps:

cargo install cargo-benchcmp
git clone https://github.com/casbin/casbin-rs
cd casbin-rs
cargo bench > before

# change rhai version

cargo bench > after
cargo benchcmp before after

GopherJ commented 4 years ago

Ok another run, still similar result so I think the regression exists:

before: 0.13.0 after: 0.14.0

➜  casbin-rs git:(master) cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,232           6,139                 1,907  45.06%   x 0.69 
 b_benchmark_basic_model               4,556           6,041                 1,485  32.59%   x 0.75 
 b_benchmark_key_match                 16,252          19,205                2,953  18.17%   x 0.85 
 b_benchmark_priority_model            5,337           7,297                 1,960  36.72%   x 0.73 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                13,513          22,093                8,580  63.49%   x 0.61 
 b_benchmark_rbac_model_large          44,999,900      61,594,631       16,594,731  36.88%   x 0.73 
 b_benchmark_rbac_model_medium         4,420,624       5,946,035         1,525,411  34.51%   x 0.74 
 b_benchmark_rbac_model_small          441,698         625,007             183,309  41.50%   x 0.71 
 b_benchmark_rbac_model_with_domains   9,059           10,465                1,406  15.52%   x 0.87 
 b_benchmark_rbac_with_deny            26,733          33,326                6,593  24.66%   x 0.80 
 b_benchmark_rbac_with_resource_roles  7,122           8,221                 1,099  15.43%   x 0.87

GopherJ commented 4 years ago

When I delete all baselines and run again for 0.13.0 and schungx/master, I see again different result:

before: 0.13.0 after: schungx/master

➜  casbin-rs git:(master) ✗ cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,249           6,207                 1,958  46.08%   x 0.68 
 b_benchmark_basic_model               4,535           7,062                 2,527  55.72%   x 0.64 
 b_benchmark_key_match                 16,568          22,239                5,671  34.23%   x 0.74 
 b_benchmark_priority_model            5,352           8,425                 3,073  57.42%   x 0.64 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                12,895          19,167                6,272  48.64%   x 0.67 
 b_benchmark_rbac_model_large          48,338,653      59,823,678       11,485,025  23.76%   x 0.81 
 b_benchmark_rbac_model_medium         4,798,350       6,329,467         1,531,117  31.91%   x 0.76 
 b_benchmark_rbac_model_small          399,847         594,940             195,093  48.79%   x 0.67 
 b_benchmark_rbac_model_with_domains   8,720           10,571                1,851  21.23%   x 0.82 
 b_benchmark_rbac_with_deny            23,933          33,556                9,623  40.21%   x 0.71 
 b_benchmark_rbac_with_resource_roles  6,226           7,948                 1,722  27.66%   x 0.78

GopherJ commented 4 years ago

casbin-rs normally use only operators like ==, >, <, >=, <=, "alice" in ["alice", "bob"], &&, ||, (...expressions...), object_map.age ... etc and also functions defined in rust.

However the same calculation needs to be run on every policy and in our scenario we can have 1000k policies, so the calculation can be quite heavy. That's why I'm also finding a way to use multiple cores if rhai@0.14.1 has added Clone impl for Scope.

I think the undetected regression means that we need to adjust rhai's benchmarks on similar scenario

schungx commented 4 years ago

One thing that baffles me is that the numbers are all over the place. For example, the same benchmark for 0.13.0 can go from 6xxx to 4xxx, while for 0.14.0 it can go from 9xxx to 6xxx. This casts some doubts as to the reliability of these benchmarks... at least we'll need to run them under identical conditions.

All we can say right now is that the upper bound of 0.13.0 seems to be similar to the lower bound of 0.14.1. So it seems that, yes, there appears to be regression that we need to get a fix on.

I'd suggest you use eval_expression if you're ever only going to need expressions, not statements, and the no_function feature if your user are never gonna define any functions. However, I don't think it would make it faster, only less likely for your user to do something whacky.

has added Clone impl for Scope.

I believe Scope is already Clone...

GopherJ commented 4 years ago

Hi @schungx I think we can trust the latest two benchmarks because I have kept the computer under the same condition and ran them one after another.

We are already using no_function because we don't want to have security issues:

rhai = { version = "0.13.0", default-features = false, features = ["sync", "only_i32", "no_function", "no_float"] }

I deleted again baselines data:

before: 0.13.0 after: schungx/master

➜  casbin-rs git:(master) cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,235           6,221                 1,986  46.89%   x 0.68 
 b_benchmark_basic_model               4,507           6,070                 1,563  34.68%   x 0.74 
 b_benchmark_key_match                 16,981          18,627                1,646   9.69%   x 0.91 
 b_benchmark_priority_model            5,409           7,474                 2,065  38.18%   x 0.72 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                13,447          20,035                6,588  48.99%   x 0.67 
 b_benchmark_rbac_model_large          39,401,991      59,560,583       20,158,592  51.16%   x 0.66 
 b_benchmark_rbac_model_medium         3,929,059       6,067,630         2,138,571  54.43%   x 0.65 
 b_benchmark_rbac_model_small          388,717         599,956             211,239  54.34%   x 0.65 
 b_benchmark_rbac_model_with_domains   7,907           10,386                2,479  31.35%   x 0.76 
 b_benchmark_rbac_with_deny            22,502          33,192               10,690  47.51%   x 0.68 
 b_benchmark_rbac_with_resource_roles  6,075           8,863                 2,788  45.89%   x 0.69

schungx commented 4 years ago

OK, we can say that's a pretty serious regression. I'll need to look into it... Can you make sure again unchecked is not helping?

GopherJ commented 4 years ago

@schungx

Here the result:

before: schungx/master without unchecked after: schungx/master with unchecked

➜  casbin-rs git:(master) ✗ cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                6,229           6,359                   130   2.09%   x 0.98 
 b_benchmark_basic_model               6,122           6,223                   101   1.65%   x 0.98 
 b_benchmark_key_match                 19,123          19,820                  697   3.64%   x 0.96 
 b_benchmark_priority_model            7,243           7,679                   436   6.02%   x 0.94 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                18,492          19,414                  922   4.99%   x 0.95 
 b_benchmark_rbac_model_large          64,651,625      58,656,739       -5,994,886  -9.27%   x 1.10 
 b_benchmark_rbac_model_medium         5,872,116       5,726,679          -145,437  -2.48%   x 1.03 
 b_benchmark_rbac_model_small          579,152         587,740               8,588   1.48%   x 0.99 
 b_benchmark_rbac_model_with_domains   10,521          10,403                 -118  -1.12%   x 1.01 
 b_benchmark_rbac_with_deny            35,901          33,373               -2,528  -7.04%   x 1.08 
 b_benchmark_rbac_with_resource_roles  8,143           8,088                   -55  -0.68%   x 1.01

It has been slightly better in some cases but not much, I consider they'll have the same result because even if we run the same benchmark under the same condition, we can still slightly different result

From:

 b_benchmark_rbac_model_large          64,651,625      58,656,739       -5,994,886  -9.27%   x 1.10

we can see it'll help when there are heavy calculation.

Large in casbin-rs's benchmark means:

10k policies
100k users

we run Engine::eval_with_scope for every policy, so at most 10k times Engine::eval_with_scope

schungx commented 4 years ago

we can see it'll help when there are heavy calculation.

Yes, that's because during each visit of an AST node I increment a counter and check whether it exceeds the limit. So unchecked probably saves 5-10% off a heavy script. Not bad.

schungx commented 4 years ago

@GopherJ can you pull from the latest of https://github.com/schungx/rhai ?

I discovered I accidentally used two layers of Arc when one layer is more than enough. So it was one wasted redirection during every function call.

GopherJ commented 4 years ago

@schungx Hi sure,

Here the result:

before: v0.13.0 after: schungx/master

➜  casbin-rs git:(master) cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,276           6,176                 1,900  44.43%   x 0.69 
 b_benchmark_basic_model               4,545           6,003                 1,458  32.08%   x 0.76 
 b_benchmark_key_match                 16,434          18,763                2,329  14.17%   x 0.88 
 b_benchmark_priority_model            5,251           7,181                 1,930  36.75%   x 0.73 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                13,943          19,274                5,331  38.23%   x 0.72 
 b_benchmark_rbac_model_large          40,734,426      59,363,478       18,629,052  45.73%   x 0.69 
 b_benchmark_rbac_model_medium         3,839,934       5,770,265         1,930,331  50.27%   x 0.67 
 b_benchmark_rbac_model_small          397,351         583,291             185,940  46.79%   x 0.68 
 b_benchmark_rbac_model_with_domains   8,198           10,153                1,955  23.85%   x 0.81 
 b_benchmark_rbac_with_deny            23,502          32,733                9,231  39.28%   x 0.72 
 b_benchmark_rbac_with_resource_roles  6,283           7,872                 1,589  25.29%   x 0.80

I think it didn't improve so much so maybe this is not the source of problem

schungx commented 4 years ago

Hhhmmm... baffling. In my benchmarks it improved quite a bit.

schungx commented 4 years ago

I've cloned your repo and now I think I got it. There is another change in 0.14.1.

Previously, all the eval_XXX methods do not run the script optimizer, because I figured that since you're parsing a script and immediately running it, you're probably not keeping it around for another run anyway. So there seemed no point in optimizing it.

However, I find that, in real life, we do need the optimizer when running one-use scripts, especially when those scripts are machine-generated from templates and they are long-running.

The reason why 0.14.1 runs much slower for you (but not for my benchmarks) is because, during every eval, the script runs through an optimizer, which is completely useless because it'll just spit the AST back out unchanged while doing a heap of work.

So, the short answer is:

Option 1: add the no_optimize feature Option 2: engine.set_optimization_level(OptimizationLevel::None)

GopherJ commented 4 years ago

@schungx Yes you are right, I tested with no_optimize and the bench seems to come back

➜  casbin-rs git:(master) cargo benchcmp before after
 name                                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 b_benchmark_abac_model                4,218           4,749                   531  12.59%   x 0.89 
 b_benchmark_basic_model               4,529           4,576                    47   1.04%   x 0.99 
 b_benchmark_key_match                 16,454          16,406                  -48  -0.29%   x 1.00 
 b_benchmark_priority_model            5,267           5,393                   126   2.39%   x 0.98 
 b_benchmark_raw                       5               5                         0   0.00%   x 1.00 
 b_benchmark_rbac_model                13,210          13,790                  580   4.39%   x 0.96 
 b_benchmark_rbac_model_large          40,392,526      41,128,598          736,072   1.82%   x 0.98 
 b_benchmark_rbac_model_medium         3,878,948       4,198,179           319,231   8.23%   x 0.92 
 b_benchmark_rbac_model_small          389,812         417,297              27,485   7.05%   x 0.93 
 b_benchmark_rbac_model_with_domains   7,858           8,046                   188   2.39%   x 0.98 
 b_benchmark_rbac_with_deny            22,681          24,006                1,325   5.84%   x 0.94 
 b_benchmark_rbac_with_resource_roles  6,056           6,050                    -6  -0.10%   x 1.00

GopherJ commented 4 years ago

@schungx Thanks for your help, also I think the benchmarks in rhai need to add more scenarios, I'm not sure if:

casbin-rs normally use only operators like ==, >, <, >=, <=, "alice" in ["alice", "bob"], &&, ||, (...expressions...), object_map.age ... etc and also functions defined in rust.

has been covered

schungx commented 4 years ago

My own benches always parse the expressions once and then bench the AST evaluations. So they nicely skip your scenario.

Come to think of it, I probably need to add a few eval_with_scope or eval_expression_with_scope benchmarks...

GopherJ commented 4 years ago

@schungx Yes I think more benchmarks need to be added because it clearly shows that the current bench cannot cover real scenarios like the case in casbin-rs.

Here the latest bench diff if you would like to have a look:

https://github.com/casbin/casbin-rs/commit/51af3dc113d1edd13dcda3fd3808f9f9aac7bcd2#commitcomment-39287182

Great thanks on your quick support!

rhaiscript / rhai

bench regression 13.0 vs 14.1 #155