noctjs / ecs-benchmark

ECS benchmark comparison
MIT License
110 stars 23 forks source link

performance discrepancies #17

Closed NateTheGreatt closed 3 years ago

NateTheGreatt commented 3 years ago

I have created two separate files for bitECS and perform-ecs. Each file has same exact ECS setup in this repository (the same 4 components and 3 systems), except with 1 million entities with the same component setup.

setup is the time it took to register systems and create entities. update is the time that it took for the entire frame to execute.

The first 25 engine ticks, while rather close in performance, still show that bitECS out-performs perform-ecs by a fair margin, and has a lot less memory thrash (virtually none).

$ node perform-ecs-bench.js 
setup 1648 ms
update 487 ms
update 37 ms
update 20 ms
update 15 ms
update 15 ms
update 15 ms
update 14 ms
update 14 ms
update 15 ms
update 15 ms
update 16 ms
update 15 ms
update 14 ms
update 15 ms
update 16 ms
update 19 ms
update 15 ms
update 15 ms
update 15 ms
update 17 ms
update 15 ms
update 14 ms
update 15 ms
update 14 ms
update 14 ms

$ node bitecs-bench.js
setup 2069 ms
update 19 ms
update 11 ms
update 11 ms
update 11 ms
update 12 ms
update 12 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 14 ms
update 12 ms
update 11 ms
update 11 ms
update 12 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 11 ms
update 12 ms

The results from benchmark.js, however, do not reflect this true performance yield (and in fact report the opposite, showing perform-ecs with the edge in update performance). I'm not sure why this is, but I think it's safe to say that the results generated from this repository are potentially misleading, because these two scripts are basically identical to the actual executing environment of games (with the ECS in its own process running on a real running update loop, and not a function being called over and over again by benchmark.js).

I think the best approach would be to give each ECS its own node process and do custom update-loop benchmarking within that singular process. I don't know what benchmark.js is doing to fudge these results.

NateTheGreatt commented 3 years ago

I had some time to investigate. I think the discrepancy comes down to how many entities are being iterated over. Under a certain amount of entities, perform-ecs out-performs bitECS by about the same margin that bitECS out performs perform-ecs at higher entity counts. With this in mind, perhaps benchmark.js is not fudging the numbers as much as I thought.

Maybe the benchmarks should be tiered by entity count?

ooflorent commented 3 years ago

Thanks for taking the time to investigate the results. Another possibility is that after a certain amount of iterations, the code gets compiled by TurboFan which produce highly optimized execution. The main drawback of perform-ecs is its megamorphism since the entities have different internal representations. If you have a lot of components, and eventually a lot of archetypes, perform-ecs would not be that efficient.

I'm currently trying to rewrite the benchmark to be fairer. I'm not interested in the first ticks because TurbanFan would not have kick in yet.

I think the best approach would be to give each ECS its own node process and do custom update-loop benchmarking within that singular process.

Agreed on that. I'm planning to use node's worker_threads to isolate the suites.

I don't know what benchmark.js is doing to fudge these results.

It does a lot of magic… But no worry. It will be gone after the rewrite. I'll reach to you when I have a working prototype. Maybe we could collaborate on it.

ooflorent commented 3 years ago

Maybe the benchmarks should be tiered by entity count?

Unfortunately there is no magic number because 1M is a pretty irrational number (no JS game would use 1M active entity) and 100 does not reflect medium/big games. I think the number we pick should be between 1K and 10K entities.

NateTheGreatt commented 3 years ago

Maybe the benchmarks should be tiered by entity count?

Unfortunately there is no magic number because 1M is a pretty irrational number (no JS game would use 1M active entity) and 100 does not reflect medium/big games. I think the number we pick should be between 1K and 10K entities.

1MM entities is irrational, but 1MM iterations is not as irrational. Iteration time is what we are truly testing here... not entity count. In this case I chose 1MM to make the frame rate take a visible amount of milliseconds, mostly just for testing purposes, but I can imagine a realistic scenario in which 10k entities are iterated over by 10 systems, resulting in a total of 100k iterations. I will have to further utilize perf_hooks in order to obtain the resolution needed to see differences at lower numbers of iterations.

Another possibility is that after a certain amount of iterations, the code gets compiled by TurboFan which produce highly optimized execution.

This happens after the first one or two ticks, as you can see the frame time drastically reduce after the first few ticks of each script:

$ node perform-ecs-bench.js 
setup 1648 ms
update 487 ms // not optimized
update 37 ms // a little optimized
update 20 ms // almost there
update 15 ms // fully optimized

$ node bitecs-bench.js
setup 2069 ms
update 19 ms // not optimized
update 11 ms // fully optimized

I will take some time to hunt for the entity count at which the performance differences flip.

Although, this won't matter once (if) I release the multithreaded version of bitECS. This is the reason I developed my ECS using TypedArrays ;)

NateTheGreatt commented 3 years ago

closing this, i think the discrepancies were from my backwards-iteration through system entities which was fixed