rust-gamedev / ecs_bench_suite

A suite of benchmarks designed to test and compare Rust ECS library performance across a variety of challenging circumstances.
78 stars 33 forks source link

Heavy compute does not give a good comparison of parallel iter #28

Open ElliotB256 opened 2 years ago

ElliotB256 commented 2 years ago

Hi,

I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be heavy_compute, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).

I propose either:

An example of option two is here: https://github.com/ElliotB256/ecs_bench_suite/tree/parallel_light_compute Further discussion can be found here: https://github.com/bevyengine/bevy/issues/2173

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

Systemcluster commented 2 years ago

The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.

In my results (where I merged your and other forks and updated all libraries among some adjustments) bevy is only 2x slower than specs in parallel_light_compute and actually faster than the other libraries. It might be sensitive on thread count as well (I ran it on a 16c/32t system), or the situation improved drastically between bevy 0.5 and 0.6.

ElliotB256 commented 2 years ago

Thanks for looking!

However, a note: bevy is extremely sensitive to batch size, while other libraries don't need a batch size to be set. Your file shows a batch size set to 1024. In the discussion I posted above, you'll find the following table which shows bevy scaling with batch size:

Batch Size Time
8 1.177ms
64 234.13us
256 149.48us
1024 130.48us
4096 207.13us
10,000 485.55us

On my pc, 1024 was the optimum batch size for bevy. For comparison, specs was 108.00 us, so bevy was about ~2x slower than specs. However, in the worst case scenario of unoptimised batch size, bevy remains >10x slower (hence my numbers in first post). I expect the 'ideal' batch size is both hardware and System dependent, and the optimum will be rarely achieved.

(Disclaimer: my tests are still for bevy 0.5 and I didn't get time to run comparisons for 0.6 yet! but my understanding is the parallel performance did not change from other discussions).