Update benchmarks + wiki for Artemis 0.7

DaanVanYperen commented 10 years ago

see https://github.com/junkdog/artemis-odb/issues/142

DaanVanYperen commented 10 years ago

My benchmarking machine went into hibernation halfway through it XD Awoke it again, wonder what wonders it holds.

DaanVanYperen commented 10 years ago

Artemis 0.7 test: Run complete. Total time: 08:27:24 XD

junkdog commented 10 years ago

I want numbers! On Sep 22, 2014 10:28 AM, "Daan van Yperen" notifications@github.com wrote:

Artemis 0.7 test: Run complete. Total time: 08:27:24 XD

— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56343273 .

DaanVanYperen commented 10 years ago

Was about to rerun the 0.7 test, but here goes:

(On i7 920, Windows 7)

Ashley 1.0.1 (pre-fiddles)

# Run complete. Total time: 00:04:57
Benchmark                                       (entityCount)   Mode  Samples      Score  Score error  Units
c.g.e.a.BaselineBenchmark.baseline_world                 1024  thrpt        3  75624,288     5392,186  ops/s
c.g.e.a.BaselineBenchmark.baseline_world                 4096  thrpt        3  18526,935    26848,591  ops/s
c.g.e.a.BaselineBenchmark.baseline_world                16384  thrpt        3   4625,712       42,174  ops/s
c.g.e.a.BaselineBenchmark.baseline_world                65536  thrpt        3    942,663       26,086  ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine             1024  thrpt        3  21933,382    20913,558  ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine             4096  thrpt        3   7835,968       64,894  ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine            16384  thrpt        3    835,880       32,574  ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine            65536  thrpt        3    139,765        0,593  ops/s

Artemis 0.4

# Run complete. Total time: 00:07:26
Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  34360,837     7849,198  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3   7090,843     2190,929  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   2025,017      773,189  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3    243,723        4,091  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3   7663,685        8,675  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   1460,616        4,545  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    210,620        4,886  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3     19,934        2,072  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  14451,126     8947,409  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   3280,108     7133,316  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   1202,851      593,784  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    134,611        2,230  ops/s

Artemis 0.6.5

# Run complete. Total time: 00:12:22
Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  37110,021      711,076  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3   8803,940       51,224  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   2175,225        2,776  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3    254,291        6,744  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3  11261,170      176,712  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   1884,348       30,491  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    240,751        7,054  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3     20,922        0,704  ops/s
c.a.PackedComponentBenchmark.packed                 1024  thrpt        3  23905,344      437,810  ops/s
c.a.PackedComponentBenchmark.packed                 4096  thrpt        3   5715,544     2312,030  ops/s
c.a.PackedComponentBenchmark.packed                16384  thrpt        3   1521,704      121,428  ops/s
c.a.PackedComponentBenchmark.packed                65536  thrpt        3    256,450        9,105  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  22448,164     1227,689  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   5289,853     4022,466  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   1471,535      134,730  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    211,853       15,725  ops/s
c.a.PooledComponentBenchmark.pooled                 1024  thrpt        3  25101,377      422,941  ops/s
c.a.PooledComponentBenchmark.pooled                 4096  thrpt        3   5914,331       71,638  ops/s
c.a.PooledComponentBenchmark.pooled                16384  thrpt        3   1453,745        7,640  ops/s
c.a.PooledComponentBenchmark.pooled                65536  thrpt        3    208,616        5,443  ops/s

Artemis 0.6.5, run 2

# Run complete. Total time: 00:12:17

Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  37105,466      664,334  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3   8620,649       39,978  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   2156,005      263,120  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3    256,924       63,115  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3  11242,421      451,512  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   1867,743      501,122  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    240,180       11,855  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3     20,211        2,255  ops/s
c.a.PackedComponentBenchmark.packed                 1024  thrpt        3  23888,094      300,538  ops/s
c.a.PackedComponentBenchmark.packed                 4096  thrpt        3   5692,797     2707,527  ops/s
c.a.PackedComponentBenchmark.packed                16384  thrpt        3   1513,119      207,792  ops/s
c.a.PackedComponentBenchmark.packed                65536  thrpt        3    254,062      158,669  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  22130,214     3317,709  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   5291,365     3912,371  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   1438,291      130,533  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    202,569       25,555  ops/s
c.a.PooledComponentBenchmark.pooled                 1024  thrpt        3  25322,537      701,040  ops/s
c.a.PooledComponentBenchmark.pooled                 4096  thrpt        3   5812,277       82,590  ops/s
c.a.PooledComponentBenchmark.pooled                16384  thrpt        3   1480,723       45,513  ops/s
c.a.PooledComponentBenchmark.pooled                65536  thrpt        3    204,784       18,014  ops/s

Artemis 0.7 - Run 1

# Run complete. Total time: 00:12:16

Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  77630,701     2565,258  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3  18278,880       76,237  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   4556,055      215,468  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3   1068,367      106,191  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3  20737,833     1026,882  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   4119,882      113,270  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    876,392      121,524  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3    146,031       29,650  ops/s
c.a.PackedComponentBenchmark.packed                 1024  thrpt        3  42479,687     5840,925  ops/s
c.a.PackedComponentBenchmark.packed                 4096  thrpt        3  10291,320     6835,512  ops/s
c.a.PackedComponentBenchmark.packed                16384  thrpt        3   2703,502      754,631  ops/s
c.a.PackedComponentBenchmark.packed                65536  thrpt        3    679,948       16,341  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  39582,010     4324,319  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   8719,888     8474,087  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   2413,530      305,245  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    575,760      238,663  ops/s
c.a.PooledComponentBenchmark.pooled                 1024  thrpt        3  44056,492     7095,070  ops/s
c.a.PooledComponentBenchmark.pooled                 4096  thrpt        3   9861,476     1376,095  ops/s
c.a.PooledComponentBenchmark.pooled                16384  thrpt        3   2419,005      492,526  ops/s
c.a.PooledComponentBenchmark.pooled                65536  thrpt        3    579,750       30,904  ops/s

Artemis 0.7 - Run 2

# Run complete. Total time: 00:12:16

Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  78931,464     5497,450  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3  18716,248      469,617  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   4669,759      101,296  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3   1150,214      171,228  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3  21086,652     1813,824  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   4185,453       37,604  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    887,343       95,790  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3    144,559       55,475  ops/s
c.a.PackedComponentBenchmark.packed                 1024  thrpt        3  43109,973     5260,269  ops/s
c.a.PackedComponentBenchmark.packed                 4096  thrpt        3  10324,360     6454,822  ops/s
c.a.PackedComponentBenchmark.packed                16384  thrpt        3   2718,924      473,272  ops/s
c.a.PackedComponentBenchmark.packed                65536  thrpt        3    641,248       79,230  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  39773,304     2192,545  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   8704,724     8457,978  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   2412,189      333,259  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    563,762      459,445  ops/s
c.a.PooledComponentBenchmark.pooled                 1024  thrpt        3  43872,951     5850,864  ops/s
c.a.PooledComponentBenchmark.pooled                 4096  thrpt        3   9881,338     1112,914  ops/s
c.a.PooledComponentBenchmark.pooled                16384  thrpt        3   2192,746     4365,359  ops/s
c.a.PooledComponentBenchmark.pooled                65536  thrpt        3    493,281      242,917  ops/s

DaanVanYperen commented 10 years ago

What explains errors like this (Ashley 1.0.1)

`BaselineBenchmark.baseline_world 4096 thrpt 3 18526,935 26848,591 ops/s c.g.e.a

junkdog commented 10 years ago

I think the entities aren't added to the world/engine - need to investigate.

junkdog commented 10 years ago

1.2.0 etc should be more in line with artemis though, but ashley 1.0.1 had perf problems

DaanVanYperen commented 10 years ago

I mean, the Score error is higher than the Score

DaanVanYperen commented 10 years ago

Updated 0.7 run. I assume you'll want to rerun this on your headless platform.

junkdog commented 10 years ago

hmm, the results for 0.7.0 look way too good, did you run them with mvn -Pfast or something? maybe your comp went into power-saving mode during 0.6.5 or something?

junkdog commented 10 years ago

I mean, the Score error is higher than the Score

Not quite clear on how it works, but they aren't the same units.

DaanVanYperen commented 10 years ago

0.6.5 has normal runtime, I'll re-run 0.6.5

DaanVanYperen commented 10 years ago

I do have a linux server at my disposal, but they usually run a dozen TF2 servers on it XD

DaanVanYperen commented 10 years ago

see run 2 above.

junkdog commented 10 years ago

Ah, in the case of ashley - it's running the old benchmarks, right? They only had 2 registered entity systems - all new ones have 4.

DaanVanYperen commented 10 years ago

The version I had at my disposal lastnight didn't compile properly, I think it was WIP so reverted to the patch before last.

Running benchmark on 53de194cb805b2ed89ba77fc0bd43cdc8d64bff5

junkdog commented 10 years ago

really weird about 0.7.0 - it's almost at least 2x as fast. not sure what i've done, but it looks wrong - got to take a look at the code.

but about running maven with -Pfast for 0.7.0 - did you do it? it could perhaps (wishful thinking) account for the increase in perf.

DaanVanYperen commented 10 years ago

I did run -Pfast. maybe you just hit a sweetspot for my architecture? XD

junkdog commented 10 years ago

Ah, it might be a win/JVM impl thing - are you running a java 7 vm?

I've seen perf comparisons in the past, windows always fared much worse than the linux counterparts. Maybe megamorphic callsites are one of those bottlenecks. I'm speculating though, no idea if my theory holds.

DaanVanYperen commented 10 years ago

java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

junkdog commented 10 years ago

Did a quick search - it appears I was wrong about JVM perf on the OS-level.

junkdog commented 10 years ago

I diffed 0.6.5 against 0.7.0: they appear to do the same thing: if so, this is indeed good news!

If you have time, mind doing a run without the "fast" profile on 0.7.0? It should perform closer to 0.6.5.

DaanVanYperen commented 10 years ago

Artemis 0.7 - disabled FAST profile

Nb. also me doing some IntellIJ, but that shouldn't make too much of an impact on this machine.

# Run complete. Total time: 00:12:16

Benchmark                                  (entityCount)   Mode  Samples      Score  Score error  Units
c.a.BaselineBenchmark.baseline                      1024  thrpt        3  36916,143    15691,534  ops/s
c.a.BaselineBenchmark.baseline                      4096  thrpt        3   9190,576      753,422  ops/s
c.a.BaselineBenchmark.baseline                     16384  thrpt        3   2289,167      227,979  ops/s
c.a.BaselineBenchmark.baseline                     65536  thrpt        3    567,952       46,397  ops/s
c.a.InsertRemoveBenchmark.insert_remove             1024  thrpt        3  20300,279     4562,217  ops/s
c.a.InsertRemoveBenchmark.insert_remove             4096  thrpt        3   4064,168      762,125  ops/s
c.a.InsertRemoveBenchmark.insert_remove            16384  thrpt        3    890,060      201,535  ops/s
c.a.InsertRemoveBenchmark.insert_remove            65536  thrpt        3    136,128       83,047  ops/s
c.a.PackedComponentBenchmark.packed                 1024  thrpt        3  24634,619     1698,145  ops/s
c.a.PackedComponentBenchmark.packed                 4096  thrpt        3   5272,865     4557,175  ops/s
c.a.PackedComponentBenchmark.packed                16384  thrpt        3   1640,268       77,889  ops/s
c.a.PackedComponentBenchmark.packed                65536  thrpt        3    403,554       63,240  ops/s
c.a.PlainComponentBenchmark.plain                   1024  thrpt        3  23539,909     1473,367  ops/s
c.a.PlainComponentBenchmark.plain                   4096  thrpt        3   5778,340     4308,614  ops/s
c.a.PlainComponentBenchmark.plain                  16384  thrpt        3   1434,236     2085,290  ops/s
c.a.PlainComponentBenchmark.plain                  65536  thrpt        3    376,448       16,279  ops/s
c.a.PooledComponentBenchmark.pooled                 1024  thrpt        3  25635,997     1778,084  ops/s
c.a.PooledComponentBenchmark.pooled                 4096  thrpt        3   6246,363      730,229  ops/s
c.a.PooledComponentBenchmark.pooled                16384  thrpt        3   1558,947       36,309  ops/s
c.a.PooledComponentBenchmark.pooled                65536  thrpt        3    379,934       86,201  ops/s

junkdog commented 10 years ago

Hah! That rocks!

junkdog commented 10 years ago

:cookie:

DaanVanYperen commented 10 years ago

You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.

junkdog commented 10 years ago

Most likely a combination of: code optimizations (minor), better L1/L2 cache usage (Entity is much smaller now and less indirection overall), monomorphic callsites and noise from competing processes.

On Mon, Sep 22, 2014 at 3:27 PM, Daan van Yperen notifications@github.com wrote:

You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.

— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56372153 .

junkdog commented 10 years ago

What are your computer specs btw? If you know the L1/L2/L3 cache sizes too, that'd be great.

On Mon, Sep 22, 2014 at 3:37 PM, Adrian Papari junkdog@angelhill.net wrote:

Most likely a combination of: code optimizations (minor), better L1/L2 cache usage (Entity is much smaller now and less indirection overall), monomorphic callsites and noise from competing processes.

On Mon, Sep 22, 2014 at 3:27 PM, Daan van Yperen <notifications@github.com

wrote:

You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.

— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56372153 .

junkdog commented 10 years ago

Ah, didn't see the bit about i7 920 before.

DaanVanYperen commented 10 years ago

Level 1 cache size ? 4 x 32 KB instruction caches Level 2 cache size ? 4 x 256 KB Level 3 cache size Inclusive shared 8 MB cache

junkdog commented 10 years ago

trying to get the ashley benchmarks to run - some stuff is a little awkward getting right as there's not always a 1:1 correlation between artemis and ashley. Once I have the insertion test ready, testing 1.2.0 should be relatively easy however.

DaanVanYperen commented 10 years ago

Can't wait to see Ashley 1.2 benchmark.

junkdog / entity-system-benchmarks