omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
316 stars 32 forks source link

[JuliaCon/proceedings-review] Performance metrics #128

Open georgebisbas opened 10 months ago

georgebisbas commented 10 months ago

Hi all,

q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?

q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?

omlins commented 10 months ago

Thank you for the questions, @georgebisbas.

q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?

The reason is that for T_eff we can define in a straightforward fashion a theoretical upper bound, which is simply T_peak, the peak memory throughput of the hardware used.

q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?

The slowdown is coming from the generation of slightly more complex code, for example for avoiding out-of-bounds accesses.

georgebisbas commented 8 months ago

Thank you for your answers @omlins. Regarding q1, is it possible to also add gpts/s for the experiments executed? I think it would be a useful addition.

omlins commented 8 months ago

@georgebisbas : thank you for your suggestion. We will try to accommodate it in the same plot.

svretina commented 7 months ago

regarding q2, if one runs the code with deactivated bounds checking, should you regain performance?