lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
287 stars 94 forks source link

Optimize clover storage and memory traffic #202

Open maddyscientist opened 9 years ago

maddyscientist commented 9 years ago

There is symmetry in the clover them that is not exploited that could reduce the memory footprint and reduce the memory traffic in the kernels that use the clover term.

Specifically, if we consider the 2x2 block form each of chiral block of the clover matrix, we have

/ A   B \
\ B* -A /

We presently only store B* and don't double store with B, however, we do double store A and -A. If we only stored A, then the number of real numbers required to store the clover term is reduced from 72 to 54, which represents 25% reduction in both memory footprint and memory traffic.

I suspect this optimization is of most utility for the twisted clover formulation, since I believe it requires two clover matrices, hence is more memory footprint bound.

This symmetry does not hold directly for the clover inverse, so it would only apply to the kernels with the direct clover term. However, clearly we can use only 54 numbers for the clover inverse, since we could just load the direct matrix (using 54 numbers) and invert on the fly. This may be too computationally expensive, but more insight may be gained from considering the 2x2 block form of the inverse.

Anyway, as a first step, we should replace the clover direct matrix using this reduced storage form. After that is done, we can consider reduced storage options for the inverse.

maddyscientist commented 9 years ago

Ok, on closer inspection I see that the clover term relation isn't quite I wrote above. Since the regular Wilson diagonal term contribution is included in the clover matrix, the actual form is like this:

/ 1+A  B  \
\ B*  1-A /

so the relation we need to relate the upper and lower diagonal parts is slightly more involved and likely means that the only way to do the memory saving for inverse matrix is to invert the clover matrix on the fly.

AlexVaq commented 9 years ago

Ok understood, so I should get going with fma soon, and then with the inversion on-the-fly.

Alex

El 21/1/2015, a las 20:06, mikeaclark notifications@github.com escribió:

Ok, on closer inspection I see that the clover term relation isn't quite I wrote above. Since the regular Wilson diagonal term contribution is included in the clover matrix, the actual form is like this:

/ 1+A B \ \ B* 1-A /

so the relation we need to relate the upper and lower diagonal parts is slightly more involved and likely means that the only way to do the memory saving for inverse matrix is to invert the clover matrix on the fly.

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/202#issuecomment-70900107.

mathiaswagner commented 9 years ago

Can’t you just recover your original form of the matrix using -1 ? But maybe I just did not think about that.

maddyscientist commented 9 years ago

Yes you can recover it trivially for the regular clover matrix, but for the inverse, it means there is no easy relationship between the upper and lower block diagonals. So I believe inversion on the fly is required (happy to be proven wrong though).

AlexVaq commented 9 years ago

In any case, there are other strong reasons to push the inversion on-the-fly, so I really think we (I) should start working on this seriously.

El 21/1/2015, a las 20:21, mikeaclark notifications@github.com escribió:

Yes you can recover it trivially for the regular clover matrix, but for the inverse, it means there is no easy relationship between the upper and lower block diagonals. So I believe inversion on the fly is required (happy to be proven wrong though).

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/202#issuecomment-70903018.

mathiaswagner commented 9 years ago

Well, the inverse was the thing I did not think about. Maybe one should anyhow try to see what happens by the additional one on the diagonal. I remember there are some formulas also for diagonal matrix + ‘something’ but I don’t know remember which restrictions there are for 'something'.