Open ElliotB256 opened 2 years ago
The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.
In my results (where I merged your and other forks and updated all libraries among some adjustments) bevy is only 2x slower than specs in parallel_light_compute
and actually faster than the other libraries. It might be sensitive on thread count as well (I ran it on a 16c/32t system), or the situation improved drastically between bevy 0.5
and 0.6
.
Thanks for looking!
However, a note: bevy is extremely sensitive to batch size, while other libraries don't need a batch size to be set. Your file shows a batch size set to 1024. In the discussion I posted above, you'll find the following table which shows bevy scaling with batch size:
Batch Size | Time |
---|---|
8 | 1.177ms |
64 | 234.13us |
256 | 149.48us |
1024 | 130.48us |
4096 | 207.13us |
10,000 | 485.55us |
On my pc, 1024 was the optimum batch size for bevy. For comparison, specs was 108.00 us, so bevy was about ~2x slower than specs. However, in the worst case scenario of unoptimised batch size, bevy remains >10x slower (hence my numbers in first post). I expect the 'ideal' batch size is both hardware and System
dependent, and the optimum will be rarely achieved.
(Disclaimer: my tests are still for bevy 0.5 and I didn't get time to run comparisons for 0.6 yet! but my understanding is the parallel performance did not change from other discussions).
Hi,
I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be
heavy_compute
, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).I propose either:
heavy_compute
(e.g., to inverting the matrix once, or multiplying a float value, something very small)parallel_light_compute
benchmark.An example of option two is here: https://github.com/ElliotB256/ecs_bench_suite/tree/parallel_light_compute Further discussion can be found here: https://github.com/bevyengine/bevy/issues/2173
The current
heavy_compute
shows bevy as about ~2x slower than specs. However,parallel_light_compute
(see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.