noctjs / ecs-benchmark

ECS benchmark comparison
MIT License
110 stars 23 forks source link

Rethink what to benchmark #18

Open ooflorent opened 3 years ago

ooflorent commented 3 years ago

Currently the suite is far from perfect because it does not highlight the strengths and the weaknesses of each library. I tried to list what could affect the performances:

In the end, what should be measured? It is not realistic to benchmark 50 components, 50 systems, 5K entities, 50% of entity mutated, 200 creations/removals per tick. I'm not even discussing the data types (bitecs cannot store closures or external objects within components) or what's happening inside the systems. For example, perform-ecs does very few object accesses (~1 per entity) and would be considerably faster than an implementation doing 2 accesses per entity. However, what's happening inside the systems could be significantly slower than the query.

I would like to gather feedback from various ECS authors to tailor a fair benchmark. The goal is not to compete but to help the user to pick the right implementation for their need.

cc @NateTheGreatt (bitecs) @ddmills (geotic) @stasm (goodluck) @fireveined (perform-ecs) @ayebear (picoes)

NateTheGreatt commented 3 years ago

Some background/insight: I built bitecs specifically to be used inside of a multiplayer and multithreaded game engine, although the performance it gives also enables single-player games with massive entity counts / iterations. Iteration performance is the absolute focus. Number of components, number of queries, and entity distribution should ideally have minimal effects on the overall performance of bitecs, though I will investigate why performance is hindered in a few of the newly added scenarios.

Pros:

Cons:

The essential tossup when choosing to use bitecs or not comes down to the fact that all game state is stored in TypedArrays, which gives a boost to iteration speeds and allows multithreading but changes the traditional semantics of entity/component data access within a system. This also makes it a little more difficult to integrate third party libraries into system logic. Personally, I use very few third-party libs in my system logic, so this is not as big of an issue for me.

If you want to build a multiplayer game, or a game with high entity counts / iterations, I would encourage the use of bitecs over other ECS's due to the performance gains (we achieved 1k+ CCU at 30 tick with our custom network engine built with this ECS, single-threaded). If you don't need the performance and are looking for cleaner syntax, easier integrations, and more traditional data-access, then another ECS may be the better choice.

As far as what sorts of benchmarks would be valuable: How about a simple game as a benchmark? We could create something akin to that famous ToDo app that everyone rebuilds on tech stacks and then bench different aspects of this hypothetical game.

ooflorent commented 3 years ago

How about a simple game as a benchmark?

I completely agree with this but the task is titanic! I think someone should propose a PR targeting its implementation so we could discuss around it. Then, we will try to implement it for every other libraries.

stasm commented 3 years ago

In my experience, it's hard to agree what a "simple" game should look like. If it's too simple we'll risk not testing a real-life scenario; if it's too complex, then implementing the benchmark for every library will be a lot of work and will also involve a lot of code not directly related to the benchmark (e.g. rendering and efficient collision detection for large entity counts).

Games can also vary in performance depending on the user's input. Ideally, the benchmark would be input-agnostic and only consist of deterministic AI-driven logic. My first thought is to try to implement a tower-defense-like scene where both the attackers and the defenders are controlled by the AI. I'd also advocate for making it grid-based to avoid costly collision detection at larger entity counts, and to skip rendering (but maybe emulate it to some extent, e.g. by calculating light positions).

Some libraries are designed to perform well when used to create simple games (goodluck is one of them) while others can efficiently scale to larger projects, at the cost of being slower in the simplest scenarios. There could be two workloads in the benchmark (small and large), or perhaps a single workload which would attempt to represent a typical scene in a medium-sized game fairly.

I can try to prepare a "spec" of a typical load of a typical scene from one of the games I created in goodluck. It would include things like:

stasm commented 3 years ago

As for the benchmark, I'd also like to include at least one system which depends on the order of the iteration. In my recent goodluck games, there's sys_transform which updates entities by traversing the tree of parent-children relationships, rather than by iterating over the entities in their order of insertion into the world.

ooflorent commented 3 years ago

Games can also vary in performance depending on the user's input. Ideally, the benchmark would be input-agnostic and only consist of deterministic AI-driven logic.

That's the reason why a game isn't a fully viable solution. Something closer to a simulation would be better. A potential issue would be related to the number of game ticks: 3600 ticks represent a minute of playtime. The fastest implementations would complete the simulation in 10ms which would result in biased results (since the last iterations would update almost nothing). Including the setup inside the benchmark would also be unfair since many implementations are slow to be initialized. A alternative would be to reset the simulation after a fixed number of ticks.

updates entities by traversing the tree of parent-children relationships, rather than by iterating over the entities in their order of insertion into the world.

That would be interesting but I'm not sure if it is relevant. I don't expect any implementation do deliver entities in order. So the benchmark would be something like this:

  1. Collect entities
  2. Build the hierarchy
  3. Traverse the tree

Only ① would be implementation dependant. ② and ③ would be the same every time.

ayebear commented 3 years ago

TLDR: Plot graphs from simulations

Whether it's a game, simulation, or benchmark, the number of entities, components, and systems will end up being the same. You also don't want to spend cycles outside of running ECS code, because then you aren't benchmarking the library speed, and results will be skewed based on how much time is spent in the simulation. However, I think the addition of real world tests would be great, but not as a replacement for the benchmarks.

I think some possibly useful data would be to benchmark many different combinations of counts of entities, components, and systems. Then you could plot graphs with this data and show asymptotic properties of how different operations behave between the frameworks, so people could choose the most optimal one for their project.

stasm commented 3 years ago

A month ago I created a toy simulation called PewPew. The code is at https://github.com/stasm/pewpew. You can see it in action at https://stasm.github.io/pewpew/.

PewPew screenshot

My goal was to create a scene with ca. 15 components and 15 systems, which would imitate actual game logic and which can run both in the browser and in Node.js. PewPew is based on Goodluck and clocks in at around 1,000 LOC.

I'm not 100% happy with the result and I'm posting this comment to gather feedback on how to design the benchmark better.

  1. PewPew isn't currently deterministic as it uses Math.random() in a few places. This should be easy to fix; there's already a custom RNG built into it.
  2. PewPew isn't stable over time: the entity count starts small and grows with time. This means that FPS will be higher a the beginning of the benchmark run than at the end and that a simple average of observed FPS might not be a good measure of performance.
  3. In order to imitate some game logic, I implemented a collision detection system. I'm not sure this is the right approach; it complicates the code and takes up to 20% of the frame (drawing excluded).

Perhaps a better approach would be to:

  1. Start with a given count of entities and only create new ones when old ones are destroyed.
  2. Remove interactions between entities (via collisions) and instead have them react to:
    • current position on the map,
    • timer events,
    • RNG events.

One of the challenges here is to design a benchmark with more than just a few systems (because we want to imitate a real game), and at the same to keep it simple (because we don't want to benchmark non-ECS code). How can we reconcile these two constraints?