Many GPUs don't support native FP64 computation, rather emulating it using FP32. As such, performance of GPU based applications can typically be improved (typically 32x) by trading accuracy of the floating point values to improve performance using the native hardware.
This commit adds a slide outlining this difference and optional tradeoff as well as adding a small table of FLOPS values for common GPUs. While I'm not sure if this is in-scope for this course, given the discussion of performance, it can be a noteworthy factor.
(I'm not sure if a change like this even belongs here, so hopefully @patricklam or @jzarnett can double check if this is a useful addition - and if so, if this is the right place for such a slide.)
Many GPUs don't support native FP64 computation, rather emulating it using FP32. As such, performance of GPU based applications can typically be improved (typically 32x) by trading accuracy of the floating point values to improve performance using the native hardware.
This commit adds a slide outlining this difference and optional tradeoff as well as adding a small table of FLOPS values for common GPUs. While I'm not sure if this is in-scope for this course, given the discussion of performance, it can be a noteworthy factor.
(I'm not sure if a change like this even belongs here, so hopefully @patricklam or @jzarnett can double check if this is a useful addition - and if so, if this is the right place for such a slide.)