Closed DaanVanYperen closed 10 years ago
My benchmarking machine went into hibernation halfway through it XD Awoke it again, wonder what wonders it holds.
Artemis 0.7 test: Run complete. Total time: 08:27:24 XD
I want numbers! On Sep 22, 2014 10:28 AM, "Daan van Yperen" notifications@github.com wrote:
Artemis 0.7 test: Run complete. Total time: 08:27:24 XD
— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56343273 .
Was about to rerun the 0.7 test, but here goes:
(On i7 920, Windows 7)
# Run complete. Total time: 00:04:57
Benchmark (entityCount) Mode Samples Score Score error Units
c.g.e.a.BaselineBenchmark.baseline_world 1024 thrpt 3 75624,288 5392,186 ops/s
c.g.e.a.BaselineBenchmark.baseline_world 4096 thrpt 3 18526,935 26848,591 ops/s
c.g.e.a.BaselineBenchmark.baseline_world 16384 thrpt 3 4625,712 42,174 ops/s
c.g.e.a.BaselineBenchmark.baseline_world 65536 thrpt 3 942,663 26,086 ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine 1024 thrpt 3 21933,382 20913,558 ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine 4096 thrpt 3 7835,968 64,894 ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine 16384 thrpt 3 835,880 32,574 ops/s
c.g.e.a.PlainComponentBenchmark.plain_engine 65536 thrpt 3 139,765 0,593 ops/s
# Run complete. Total time: 00:07:26
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 34360,837 7849,198 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 7090,843 2190,929 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 2025,017 773,189 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 243,723 4,091 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 7663,685 8,675 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 1460,616 4,545 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 210,620 4,886 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 19,934 2,072 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 14451,126 8947,409 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 3280,108 7133,316 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 1202,851 593,784 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 134,611 2,230 ops/s
# Run complete. Total time: 00:12:22
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 37110,021 711,076 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 8803,940 51,224 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 2175,225 2,776 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 254,291 6,744 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 11261,170 176,712 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 1884,348 30,491 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 240,751 7,054 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 20,922 0,704 ops/s
c.a.PackedComponentBenchmark.packed 1024 thrpt 3 23905,344 437,810 ops/s
c.a.PackedComponentBenchmark.packed 4096 thrpt 3 5715,544 2312,030 ops/s
c.a.PackedComponentBenchmark.packed 16384 thrpt 3 1521,704 121,428 ops/s
c.a.PackedComponentBenchmark.packed 65536 thrpt 3 256,450 9,105 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 22448,164 1227,689 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 5289,853 4022,466 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 1471,535 134,730 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 211,853 15,725 ops/s
c.a.PooledComponentBenchmark.pooled 1024 thrpt 3 25101,377 422,941 ops/s
c.a.PooledComponentBenchmark.pooled 4096 thrpt 3 5914,331 71,638 ops/s
c.a.PooledComponentBenchmark.pooled 16384 thrpt 3 1453,745 7,640 ops/s
c.a.PooledComponentBenchmark.pooled 65536 thrpt 3 208,616 5,443 ops/s
# Run complete. Total time: 00:12:17
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 37105,466 664,334 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 8620,649 39,978 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 2156,005 263,120 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 256,924 63,115 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 11242,421 451,512 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 1867,743 501,122 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 240,180 11,855 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 20,211 2,255 ops/s
c.a.PackedComponentBenchmark.packed 1024 thrpt 3 23888,094 300,538 ops/s
c.a.PackedComponentBenchmark.packed 4096 thrpt 3 5692,797 2707,527 ops/s
c.a.PackedComponentBenchmark.packed 16384 thrpt 3 1513,119 207,792 ops/s
c.a.PackedComponentBenchmark.packed 65536 thrpt 3 254,062 158,669 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 22130,214 3317,709 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 5291,365 3912,371 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 1438,291 130,533 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 202,569 25,555 ops/s
c.a.PooledComponentBenchmark.pooled 1024 thrpt 3 25322,537 701,040 ops/s
c.a.PooledComponentBenchmark.pooled 4096 thrpt 3 5812,277 82,590 ops/s
c.a.PooledComponentBenchmark.pooled 16384 thrpt 3 1480,723 45,513 ops/s
c.a.PooledComponentBenchmark.pooled 65536 thrpt 3 204,784 18,014 ops/s
# Run complete. Total time: 00:12:16
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 77630,701 2565,258 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 18278,880 76,237 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 4556,055 215,468 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 1068,367 106,191 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 20737,833 1026,882 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 4119,882 113,270 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 876,392 121,524 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 146,031 29,650 ops/s
c.a.PackedComponentBenchmark.packed 1024 thrpt 3 42479,687 5840,925 ops/s
c.a.PackedComponentBenchmark.packed 4096 thrpt 3 10291,320 6835,512 ops/s
c.a.PackedComponentBenchmark.packed 16384 thrpt 3 2703,502 754,631 ops/s
c.a.PackedComponentBenchmark.packed 65536 thrpt 3 679,948 16,341 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 39582,010 4324,319 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 8719,888 8474,087 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 2413,530 305,245 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 575,760 238,663 ops/s
c.a.PooledComponentBenchmark.pooled 1024 thrpt 3 44056,492 7095,070 ops/s
c.a.PooledComponentBenchmark.pooled 4096 thrpt 3 9861,476 1376,095 ops/s
c.a.PooledComponentBenchmark.pooled 16384 thrpt 3 2419,005 492,526 ops/s
c.a.PooledComponentBenchmark.pooled 65536 thrpt 3 579,750 30,904 ops/s
# Run complete. Total time: 00:12:16
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 78931,464 5497,450 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 18716,248 469,617 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 4669,759 101,296 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 1150,214 171,228 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 21086,652 1813,824 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 4185,453 37,604 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 887,343 95,790 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 144,559 55,475 ops/s
c.a.PackedComponentBenchmark.packed 1024 thrpt 3 43109,973 5260,269 ops/s
c.a.PackedComponentBenchmark.packed 4096 thrpt 3 10324,360 6454,822 ops/s
c.a.PackedComponentBenchmark.packed 16384 thrpt 3 2718,924 473,272 ops/s
c.a.PackedComponentBenchmark.packed 65536 thrpt 3 641,248 79,230 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 39773,304 2192,545 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 8704,724 8457,978 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 2412,189 333,259 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 563,762 459,445 ops/s
c.a.PooledComponentBenchmark.pooled 1024 thrpt 3 43872,951 5850,864 ops/s
c.a.PooledComponentBenchmark.pooled 4096 thrpt 3 9881,338 1112,914 ops/s
c.a.PooledComponentBenchmark.pooled 16384 thrpt 3 2192,746 4365,359 ops/s
c.a.PooledComponentBenchmark.pooled 65536 thrpt 3 493,281 242,917 ops/s
What explains errors like this (Ashley 1.0.1)
`BaselineBenchmark.baseline_world 4096 thrpt 3 18526,935 26848,591 ops/s c.g.e.a
I think the entities aren't added to the world/engine - need to investigate.
1.2.0 etc should be more in line with artemis though, but ashley 1.0.1 had perf problems
I mean, the Score error is higher than the Score
Updated 0.7 run. I assume you'll want to rerun this on your headless platform.
hmm, the results for 0.7.0 look way too good, did you run them with mvn -Pfast
or something? maybe your comp went into power-saving mode during 0.6.5 or something?
I mean, the Score error is higher than the Score
Not quite clear on how it works, but they aren't the same units.
0.6.5 has normal runtime, I'll re-run 0.6.5
I do have a linux server at my disposal, but they usually run a dozen TF2 servers on it XD
see run 2 above.
Ah, in the case of ashley - it's running the old benchmarks, right? They only had 2 registered entity systems - all new ones have 4.
The version I had at my disposal lastnight didn't compile properly, I think it was WIP so reverted to the patch before last.
Running benchmark on 53de194cb805b2ed89ba77fc0bd43cdc8d64bff5
really weird about 0.7.0 - it's almost at least 2x as fast. not sure what i've done, but it looks wrong - got to take a look at the code.
but about running maven with -Pfast
for 0.7.0 - did you do it? it could perhaps (wishful thinking) account for the increase in perf.
I did run -Pfast. maybe you just hit a sweetspot for my architecture? XD
Ah, it might be a win/JVM impl thing - are you running a java 7 vm?
I've seen perf comparisons in the past, windows always fared much worse than the linux counterparts. Maybe megamorphic callsites are one of those bottlenecks. I'm speculating though, no idea if my theory holds.
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
Did a quick search - it appears I was wrong about JVM perf on the OS-level.
I diffed 0.6.5 against 0.7.0: they appear to do the same thing: if so, this is indeed good news!
If you have time, mind doing a run without the "fast" profile on 0.7.0? It should perform closer to 0.6.5.
Artemis 0.7 - disabled FAST profile
Nb. also me doing some IntellIJ, but that shouldn't make too much of an impact on this machine.
# Run complete. Total time: 00:12:16
Benchmark (entityCount) Mode Samples Score Score error Units
c.a.BaselineBenchmark.baseline 1024 thrpt 3 36916,143 15691,534 ops/s
c.a.BaselineBenchmark.baseline 4096 thrpt 3 9190,576 753,422 ops/s
c.a.BaselineBenchmark.baseline 16384 thrpt 3 2289,167 227,979 ops/s
c.a.BaselineBenchmark.baseline 65536 thrpt 3 567,952 46,397 ops/s
c.a.InsertRemoveBenchmark.insert_remove 1024 thrpt 3 20300,279 4562,217 ops/s
c.a.InsertRemoveBenchmark.insert_remove 4096 thrpt 3 4064,168 762,125 ops/s
c.a.InsertRemoveBenchmark.insert_remove 16384 thrpt 3 890,060 201,535 ops/s
c.a.InsertRemoveBenchmark.insert_remove 65536 thrpt 3 136,128 83,047 ops/s
c.a.PackedComponentBenchmark.packed 1024 thrpt 3 24634,619 1698,145 ops/s
c.a.PackedComponentBenchmark.packed 4096 thrpt 3 5272,865 4557,175 ops/s
c.a.PackedComponentBenchmark.packed 16384 thrpt 3 1640,268 77,889 ops/s
c.a.PackedComponentBenchmark.packed 65536 thrpt 3 403,554 63,240 ops/s
c.a.PlainComponentBenchmark.plain 1024 thrpt 3 23539,909 1473,367 ops/s
c.a.PlainComponentBenchmark.plain 4096 thrpt 3 5778,340 4308,614 ops/s
c.a.PlainComponentBenchmark.plain 16384 thrpt 3 1434,236 2085,290 ops/s
c.a.PlainComponentBenchmark.plain 65536 thrpt 3 376,448 16,279 ops/s
c.a.PooledComponentBenchmark.pooled 1024 thrpt 3 25635,997 1778,084 ops/s
c.a.PooledComponentBenchmark.pooled 4096 thrpt 3 6246,363 730,229 ops/s
c.a.PooledComponentBenchmark.pooled 16384 thrpt 3 1558,947 36,309 ops/s
c.a.PooledComponentBenchmark.pooled 65536 thrpt 3 379,934 86,201 ops/s
Hah! That rocks!
:cookie:
You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.
Most likely a combination of: code optimizations (minor), better L1/L2 cache usage (Entity is much smaller now and less indirection overall), monomorphic callsites and noise from competing processes.
On Mon, Sep 22, 2014 at 3:27 PM, Daan van Yperen notifications@github.com wrote:
You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.
— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56372153 .
What are your computer specs btw? If you know the L1/L2/L3 cache sizes too, that'd be great.
On Mon, Sep 22, 2014 at 3:37 PM, Adrian Papari junkdog@angelhill.net wrote:
Most likely a combination of: code optimizations (minor), better L1/L2 cache usage (Entity is much smaller now and less indirection overall), monomorphic callsites and noise from competing processes.
On Mon, Sep 22, 2014 at 3:27 PM, Daan van Yperen <notifications@github.com
wrote:
You've been looking at these things a lot longer, what's the gist of it? Looks like it doubled the throughput in some cases.
— Reply to this email directly or view it on GitHub https://github.com/junkdog/entity-system-benchmarks/issues/7#issuecomment-56372153 .
Ah, didn't see the bit about i7 920 before.
Level 1 cache size ? 4 x 32 KB instruction caches Level 2 cache size ? 4 x 256 KB Level 3 cache size Inclusive shared 8 MB cache
trying to get the ashley benchmarks to run - some stuff is a little awkward getting right as there's not always a 1:1 correlation between artemis and ashley. Once I have the insertion test ready, testing 1.2.0 should be relatively easy however.
Can't wait to see Ashley 1.2 benchmark.
see https://github.com/junkdog/artemis-odb/issues/142