lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

Fat/long force optimizations #1367

Closed weinbe2 closed 1 year ago

weinbe2 commented 1 year ago

This PR constitutes a large refactor and optimization of the implementation of the fat/long force computation in QUDA. It introduces multiple different groups of optimizations:

Five- and seven-link terms

Three-link and Lepage terms

Ancillary detriments and benefits from fusion

Gauge field compression

Testing and timing

General clean-up of FLOPS/bytes counts

There is still some remaining clean-up to be done in this PR, none of which block opening this PR sooner as opposed to later:

With regards to gauge compression, there is still outstanding work to enable recon-12 for the second step of the HISQ force chain rule because the U field is an SU(3) field (as opposed to the first step featuring the U(3) W field). Since this corresponds to a marginal gain in memory savings (relative to enabling recon-13) at the expense of a decent amount of coding headaches, we're going to punt this to a subsequent PR.

As a function of the problem size and the geometry of the decomposition, this can lead to a ~30+% performance boost in the computation of the HISQ force. This is due to the large amount of kernel fusion (and the corresponding cache reuse from it) as well as the introduction of gauge reconstruction for the U/W field and a reduction of the depth of the halo that needs to be explicitly looped over as part of the computation.

maddyscientist commented 1 year ago

~When building, I'm getting the following warning (CUDA 12.0 / GCC 11)~

/home/kate/github/quda-develop-old/tests/host_reference/hisq_force_reference.cpp: In instantiation of ‘su3_matrix* get_su3_matrix(int, su3_matrix*, int, int) [with su3_matrix = fsu3_matrix]’:
/home/kate/github/quda-develop-old/tests/host_reference/hisq_force_reference.cpp:109:37:   required from ‘void computeLinkOrderedOuterProduct(su3_vector*, su3_matrix*, size_t, int) [with su3_vector = fsu3_vector; su3_matrix = fsu3_matrix; size_t = long unsigned int]’
/home/kate/github/quda-develop-old/tests/host_reference/hisq_force_reference.cpp:118:35:   required from here
/home/kate/github/quda-develop-old/tests/host_reference/hisq_force_reference.cpp:87:63: warning: unused parameter ‘gauge_order’ [-Wunused-parameter]
   87 | template <typename su3_matrix> su3_matrix *get_su3_matrix(int gauge_order, su3_matrix *p, int idx, int dir)

Edit: this was caused by me using a stale local copy of the branch. Error is not present in HEAD.

maddyscientist commented 1 year ago

Something I just noticed from testing, is that the hisq_paths_force_test test will wrongly complain of failing if --verify false is pass. The correctness check should not be applied in this case.

weinbe2 commented 1 year ago

Something I just noticed from testing, is that the hisq_paths_force_test test will wrongly complain of failing if --verify false is pass. The correctness check should not be applied in this case.

https://github.com/lattice/quda/pull/1367/commits/9e9845e5827a35ac61349c89a718f0ad192014e7

maddyscientist commented 1 year ago

@mathiaswagner are you wanting to review this before we merge?

mathiaswagner commented 1 year ago

I guess this has seen enough testing and I am not sure I'll have cycles this week so feel free to go ahead with the merge.