Smaller grid size per GPU does not speed up the calculation significantly

groundcherry / bluebottle-3.0

A many-GPU-centric two phase flow simulation code implementing the Physalis method

Apache License 2.0

10 stars 6 forks source link

Smaller grid size per GPU does not speed up the calculation significantly #5

Open tomchen95 opened 3 months ago

tomchen95 commented 3 months ago

Hi!

I found that if the grid size in the domain is decreasing (assuming that only 1 GPU is used), the speed-up of the calculation will not increase significantly (if the grid size is too small). For example, the time spent in simulations with global grid size 160160160 is less than twice of that with global grid size 16016080. But in normal CPU CFD codes, the speed-ups will be more than twice if the global grid size is half. I'm not sure why this is the case. Is it the nature of GPU calculation?

Thanks, Tom

ajsierakowski commented 3 months ago

Are you comparing to a single-threaded CPU code? Multi-threaded? Distributed? The only reason I can think of why a code would run more than twice as fast when cutting the problem size in half (or, said another way, that scales less than linearly with problem size) is if you are seeing significant communication bottlenecks that decrease in influence as problem size decreases. In the single-GPU case, inter-process communication is negligible, which is why it scales very nearly linearly with problem size.

tomchen95 commented 3 months ago

I'm actually just considering computations on just 1 GPU, so there's no communication consideration. If the global computational domain size is 160160160 and 16016080 respectively, 160160160 should run at least 2 times slower than 16016080, provided that only 1 GPU is used for both cases. But based on my test, 160^3 is not that slow and 16016080 is not that fast. So I'm wondering why.