Open georgebisbas opened 10 months ago
Thank you for the questions, @georgebisbas.
q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?
The reason is that for T_eff
we can define in a straightforward fashion a theoretical upper bound, which is simply T_peak
, the peak memory throughput of the hardware used.
q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?
The slowdown is coming from the generation of slightly more complex code, for example for avoiding out-of-bounds accesses.
Thank you for your answers @omlins. Regarding q1, is it possible to also add gpts/s for the experiments executed? I think it would be a useful addition.
@georgebisbas : thank you for your suggestion. We will try to accommodate it in the same plot.
regarding q2, if one runs the code with deactivated bounds checking, should you regain performance?
Hi all,
q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?
q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?