junkdog / bitvector

Uncompressed, dynamically resizeable bitset for Kotlin (JS/JVM/Android)
MIT License
12 stars 1 forks source link

Inline forEachBit #1

Closed DaanVanYperen closed 7 years ago

DaanVanYperen commented 7 years ago

Argument for inlining fun forEachBit(f: (Int) -> Unit)'

https://medium.com/@BladeCoder/exploring-kotlins-hidden-costs-part-1-fbb9935d9b62

"calling a function passed as argument in a higher-order function will actually involve systematic boxing and unboxing when the function involves primitive types (like Int or Long) for input values or the return value. This may have a non-negligible impact on performance, especially on Android."

junkdog commented 7 years ago

inlined it yesterday, also inlined some one-liners + an allocation. a bit faster now, but perf scaling is all wrong!

                              (fillRate)   Mode  Cnt     Score     Error   Units
artemis_foreach_intbag               .01  thrpt    5  6783.714 ±  37.221  ops/ms
artemis_foreach_intbag               .05  thrpt    5  2565.076 ±  68.773  ops/ms
artemis_foreach_intbag               .10  thrpt    5  1809.597 ± 408.815  ops/ms
artemis_foreach_intbag               .20  thrpt    5   937.769 ±  10.426  ops/ms
artemis_foreach_intbag               .40  thrpt    5   634.629 ±   7.420  ops/ms
artemis_foreach_intbag               .50  thrpt    5   535.599 ±  11.867  ops/ms
artemis_foreach_intbag               .60  thrpt    5   450.715 ±   3.111  ops/ms
artemis_foreach_intbag               .70  thrpt    5   385.624 ±   6.376  ops/ms
artemis_foreach_intbag               .80  thrpt    5   344.168 ±   3.625  ops/ms
artemis_foreach_intbag               .95  thrpt    5   293.601 ±   2.317  ops/ms

bitset_foreach_intbag                .01  thrpt    5  4975.838 ±  46.883  ops/ms
bitset_foreach_intbag                .05  thrpt    5  1146.789 ±  14.570  ops/ms
bitset_foreach_intbag                .10  thrpt    5   561.831 ±   5.252  ops/ms
bitset_foreach_intbag                .20  thrpt    5   293.847 ±   1.997  ops/ms
bitset_foreach_intbag                .40  thrpt    5   151.562 ±   0.999  ops/ms
bitset_foreach_intbag                .50  thrpt    5   122.918 ±   1.296  ops/ms
bitset_foreach_intbag                .60  thrpt    5   102.998 ±   1.189  ops/ms
bitset_foreach_intbag                .70  thrpt    5    88.612 ±   0.610  ops/ms
bitset_foreach_intbag                .80  thrpt    5    78.199 ±   0.701  ops/ms
bitset_foreach_intbag                .95  thrpt    5    66.407 ±   1.903  ops/ms

bitvector_foreach_bit_intbag         .01  thrpt    5  6952.414 ± 116.490  ops/ms
bitvector_foreach_bit_intbag         .05  thrpt    5  2225.402 ±  31.059  ops/ms
bitvector_foreach_bit_intbag         .10  thrpt    5  1350.391 ±  51.364  ops/ms
bitvector_foreach_bit_intbag         .20  thrpt    5   238.853 ±   2.020  ops/ms
bitvector_foreach_bit_intbag         .40  thrpt    5   127.833 ±   1.202  ops/ms
bitvector_foreach_bit_intbag         .50  thrpt    5    97.899 ±   3.799  ops/ms
bitvector_foreach_bit_intbag         .60  thrpt    5    83.227 ±   0.681  ops/ms
bitvector_foreach_bit_intbag         .70  thrpt    5    71.995 ±   1.902  ops/ms
bitvector_foreach_bit_intbag         .80  thrpt    5    64.274 ±   0.676  ops/ms
bitvector_foreach_bit_intbag         .95  thrpt    5    53.562 ±   0.229  ops/ms

bitvector_foreach_intbag             .01  thrpt    5  2456.478 ± 318.174  ops/ms
bitvector_foreach_intbag             .05  thrpt    5   724.810 ±  26.231  ops/ms
bitvector_foreach_intbag             .10  thrpt    5   557.110 ±  12.026  ops/ms
bitvector_foreach_intbag             .20  thrpt    5   303.559 ±  11.474  ops/ms
bitvector_foreach_intbag             .40  thrpt    5    95.851 ±   1.811  ops/ms
bitvector_foreach_intbag             .50  thrpt    5    80.186 ±   2.099  ops/ms
bitvector_foreach_intbag             .60  thrpt    5    63.945 ±   1.164  ops/ms
bitvector_foreach_intbag             .70  thrpt    5    57.754 ±   1.090  ops/ms
bitvector_foreach_intbag             .80  thrpt    5    49.801 ±   1.574  ops/ms
bitvector_foreach_intbag             .95  thrpt    5    41.407 ±  10.949  ops/ms
DaanVanYperen commented 7 years ago

Where is the jitter coming from? What's the thrpt?

Time to dust off the decompiler!

Did you read the whole hidden cost series btw? Worth a read.

On Mon, Jun 26, 2017 at 4:41 PM, Adrian Papari notifications@github.com wrote:

inlined it yesterday, also inlined some one-liners + an allocation. a bit faster now, but perf scaling is all wrong!

.artemis_foreach_intbag .01 thrpt 5 6805.866 ± 41.215 ops/ms .artemis_foreach_intbag .05 thrpt 5 2583.201 ± 15.826 ops/ms .artemis_foreach_intbag .10 thrpt 5 1875.405 ± 22.422 ops/ms .artemis_foreach_intbag .20 thrpt 5 943.852 ± 6.775 ops/ms .artemis_foreach_intbag .40 thrpt 5 635.245 ± 15.748 ops/ms .artemis_foreach_intbag .50 thrpt 5 538.799 ± 2.202 ops/ms .artemis_foreach_intbag .60 thrpt 5 449.207 ± 5.019 ops/ms .artemis_foreach_intbag .70 thrpt 5 387.974 ± 11.123 ops/ms .artemis_foreach_intbag .80 thrpt 5 344.659 ± 6.701 ops/ms .artemis_foreach_intbag .95 thrpt 5 295.040 ± 1.709 ops/ms .bitset_foreach_intbag .01 thrpt 5 4974.386 ± 36.386 ops/ms .bitset_foreach_intbag .05 thrpt 5 1148.447 ± 9.751 ops/ms .bitset_foreach_intbag .10 thrpt 5 580.303 ± 5.782 ops/ms .bitset_foreach_intbag .20 thrpt 5 293.116 ± 1.678 ops/ms .bitset_foreach_intbag .40 thrpt 5 151.339 ± 1.062 ops/ms .bitset_foreach_intbag .50 thrpt 5 123.187 ± 0.278 ops/ms .bitset_foreach_intbag .60 thrpt 5 103.284 ± 0.498 ops/ms .bitset_foreach_intbag .70 thrpt 5 88.839 ± 0.790 ops/ms .bitset_foreach_intbag .80 thrpt 5 78.440 ± 0.279 ops/ms .bitset_foreach_intbag .95 thrpt 5 66.673 ± 0.380 ops/ms .bitvector_foreach_bit_intbag .01 thrpt 5 5365.021 ± 34.967 ops/ms .bitvector_foreach_bit_intbag .05 thrpt 5 2367.164 ± 13.766 ops/ms .bitvector_foreach_bit_intbag .10 thrpt 5 1359.262 ± 88.472 ops/ms .bitvector_foreach_bit_intbag .20 thrpt 5 246.405 ± 6.141 ops/ms .bitvector_foreach_bit_intbag .40 thrpt 5 125.932 ± 3.864 ops/ms .bitvector_foreach_bit_intbag .50 thrpt 5 101.191 ± 1.080 ops/ms .bitvector_foreach_bit_intbag .60 thrpt 5 82.115 ± 1.730 ops/ms .bitvector_foreach_bit_intbag .70 thrpt 5 71.708 ± 0.307 ops/ms .bitvector_foreach_bit_intbag .80 thrpt 5 61.768 ± 0.365 ops/ms .bitvector_foreach_bit_intbag .95 thrpt 5 52.533 ± 0.510 ops/ms .bitvector_foreach_intbag .01 thrpt 5 2483.902 ± 48.907 ops/ms .bitvector_foreach_intbag .05 thrpt 5 731.294 ± 19.386 ops/ms .bitvector_foreach_intbag .10 thrpt 5 555.049 ± 8.409 ops/ms .bitvector_foreach_intbag .20 thrpt 5 299.841 ± 6.832 ops/ms .bitvector_foreach_intbag .40 thrpt 5 96.343 ± 2.717 ops/ms .bitvector_foreach_intbag .50 thrpt 5 78.308 ± 1.910 ops/ms .bitvector_foreach_intbag .60 thrpt 5 65.996 ± 1.744 ops/ms .bitvector_foreach_intbag .70 thrpt 5 57.469 ± 1.542 ops/ms .bitvector_foreach_intbag .80 thrpt 5 49.345 ± 2.161 ops/ms .bitvector_foreach_intbag .95 thrpt 5 42.456 ± 2.351 ops/ms

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/junkdog/bitvector/issues/1#issuecomment-311079698, or mute the thread https://github.com/notifications/unsubscribe-auth/AGA3QlQEdvVKv1vR5xxSMg_8BSO9ZND3ks5sH8McgaJpZM4OEVmp .

junkdog commented 7 years ago

Score is throughput. 1% fillrate is fast, 95% is slow-as-fnuque.

Did you read the whole hidden cost series btw? Worth a read.

read/skimmed - going to re-read it though.

junkdog commented 7 years ago

Where is the jitter coming from?

https://github.com/junkdog/bitvector/blob/master/jvm-benchmarks/src/main/kotlin/net/onedaybeard/bitvector/BitVectorBenchmark.kt

or wait, the degradation - that, i do not know. did a jmh with the GC profiler, no allocations during the forEachBit benchmark. haven't setup perfasm yet, which might yield something - if i can make sense of it.