Open robnagler opened 6 years ago
I would highly recommend do concurrent
or vanilla do
loops in performance critical regions, over other constructs. Often (not always) array syntax and especially forall
statements can cause really painful cache misses in performance critical code. Array statements where array sections are memory aligned, contiguous blocks of O(cache-line length) elements tend to be fine; array statements where array sections (including the rest of the cache line if strided) of the working variables don't fit in L1 and/or L2 end up evicting from cache and re-reading the variables from more distant memory repeatedly. (This is due to the semantics of array statements and forall
statements.) do concurrent
helps to avoid this in most cases.
Convert
do
loops to:do concurrent
do
loops, e.g.,integer, allocatable :: even_nums(:); even_nums = [(2*i, i=1,n)]