rust-gamedev / ecs_bench_suite

A suite of benchmarks designed to test and compare Rust ECS library performance across a variety of challenging circumstances.
78 stars 33 forks source link

Memory allocation overhead varies depending on platform #1

Open kabergstrom opened 4 years ago

kabergstrom commented 4 years ago

When running the benchmarks, I got more than an 80% improvement (10ms to 2ms) when I switched from platform-provided malloc to rpmalloc on windows for the serialize_binary benchmark. Additionally, after switching, some other cases had up to 36% difference in runtime. I think a good avenue to explore for extending the benches would be to measure # and size of allocations for the test cases, and to warn people about platform-provided malloc. Maybe you should mandate a custom allocator that is cross-platform to avoid people benchmarking the wrong thing.

Additionally, I would advise pre-allocating serialization buffers to ensure it's not just a bench of Vec::grow.

TomGillen commented 4 years ago

I just tried using rpmalloc, and while performance improves quite a bit, the shipyard allocate benchmark crashes with "memory allocation of 2424 bytes failed".

kabergstrom commented 4 years ago

Perhaps @leudz could check why this is?

leudz commented 4 years ago

I've narrowed it down to:

rayon::ThreadPoolBuilder::new().build().unwrap();

It triggers this assert sometimes.

kabergstrom commented 4 years ago

I suppose the issue is that the benchmark creates a new World in the bench function, and in shipyard's case, creating a new World will create a new threadpool which immediately spawns threads. rpmalloc allocs heaps per-thread too, and I suppose this intense thread creation pressure is causing a OOM condition since Windows doesn't overcommit.

kabergstrom commented 4 years ago

@leudz Do you have any opinion on how to solve this?

leudz commented 4 years ago

For non parallel benchmarks removing the parallel feature would work. Maybe using a custom pool could solve the problem, I'm not sure.

leudz commented 4 years ago

Shipyard now uses the global ThreadPool, problem solved =)