lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
287 stars 94 forks source link

Make unitarized fat7 link special unitarized. #1409

Closed SaltyChiang closed 10 months ago

SaltyChiang commented 11 months ago

We need to unitarise the fat7 links in the HISQ action. In MILC and Chroma, the links are finally projected to SU(3). But QUDA doesn't perform the SU(3) projection. This makes the propagator solved by QUDA slightly different from that of Chroma.

This PR implements the extra projection. This PR is also related to #1408 and now the CPS staggered phase is the same as that in Chroma.

There are some if statements depending on whether an input pointer is nullptr or not (for example, computeKSLinkQuda). I need to use QUDA_QDP_GAUGE_ORDER as my CPU gauge order, and a cpuGaugeField constructor with QUDA_REFERENCE_FIELD_CREATE now fails. This PR sets all four pointers to nullptr if the input is nullptr.

I have checked the propagator solved by QUDA with Chroma and they agree with each other.

mathiaswagner commented 11 months ago

Jenkins: Can one of the admins verify this patch?

mathiaswagner commented 11 months ago

Jenkins: Can one of the admins verify this patch?

weinbe2 commented 10 months ago

Hi @SaltyChiang , this is an interesting observation. I can't speak to Chroma's behavior, so I assume you are doing the correct thing.

For MILC, the function you pointed to, load_W_from_Y, does not reach the su3 projection that you noted---instead, it shortcuts at line 358:

if(aux->WeqY)return;

Where that flag is set at https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/fermion_links_hisq_load_milc.c#L724 . The HISQ stencil used in su3_rhm[c/d]_hisq as defined only projects to U(3) instead of SU(3). This is a nightmare to confirm, but the logic goes as:

(I will acknowledge that the original HISQ paper, https://arxiv.org/pdf/hep-lat/0610092.pdf , does use an SU(3) projection for measurements---see comments near equation 30---however they note that U(3) works perfectly well. I assume the MILC devs had their reasons to not match Follana et al exactly.)

Anyway, back to the code: load_W_from_Y is roughly a no-op, W aliases Y, and we see in load_Y_from_V that there is only a projection to U(3): https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/fermion_links_hisq_load_milc.c#L147

To this end, we need to make sure the existing logic in QUDA is used for the MILC phase type, while your logic is used for Chroma---which itself seems to use the SU(3) projection for its own reasons. In your PR, this just means your new code just needs to be wrapped in an if (...QUDA_STAGGERED_PHASE_CPS) appropriately.

SaltyChiang commented 10 months ago

Hi @weinbe2, thank you very much for the detailed reply!

I didn't notice the WeqY flag and apologize sincerely for the misleading information about MILC's behavior.

Just applying the projection in the CPS phase case seems to be a good idea. But maybe I should ask if CPS is using QUDA as their HISQ inverter or not now? The modification here might also affect their code. Just adding a new type QUDA_STAGGERED_PAHSE_CHROMA is another option, which will not affect the present logic. Which one do you prefer?

Also, do you have any comment about the implementation of QUDA_STAGGERED_PHASE_CPS?

PS: @ybyang told me that QUDA uses -Dslash+m while Chroma uses Dslash+m in the HISQ inverter. Maybe a QUDA_STAGGERED_PAHSE_CHROMA flag is a better choice, as we can add an extra phase to cancel the difference.

weinbe2 commented 10 months ago

No problem @SaltyChiang , it was a good exercise in staring at the MILC code---it took me about an hour to convince myself that there hasn't been an issue in the MILC workflow (the comment in Follana et al about using U(3) instead of SU(3) added to the terror)!

Tagging @hummingtree and @maddyscientist about CPS+HISQ --- I'm not sure if it's being used in practice? I do like the idea of having a separate QUDA_STAGGERED_PHASE_CHROMA to avoid breakage, in any case.

Is the -Dslash vs Dslash convention causing you any particular headaches for now? One workaround is having Chroma request the dagger operator from QUDA. We definitely need to solve this more robustly---this is all a bit of a legacy headache---but if it's not causing any issues for you right now, I'm inclined to punt on it (but we can schedule a call to make a real plan for it in the longer term).

One thing I'll note is, if I remember correctly, both conventions give identical even-odd preconditioned operators, just different prepare/reconstructs, which might be why that bit hasn't caused you headaches (yet).

hummingtree commented 10 months ago

@SaltyChiang There is no QUDA usages from CPS with staggered fermion that we are aware of. If Chroma works fine with QUDA_STAGGERED_PHASE_CPS with your changes here, it is good for now. If you like the idea of introducing QUDA_STAGGERED_PHASE_CHROMA, I suggest add an errorQuda when QUDA_STAGGERED_PHASE_CPS is seen to indicate there is currently limited QUDA support for CPS with staggered.

mathiaswagner commented 10 months ago

@Jenkins test this please.

SaltyChiang commented 10 months ago

@weinbe2 You are right, the difference between -Dslash and Dslash can be easily solved by adding an overall -1 factor to the staggered phase. I added the QUDA_STAGGERED_PHASE_CHROMA to implement the extra -1 factor, and the result looks the same as Chroma's without any pre/post process.

As @maddyscientist confirmed in #1408, the CPS phase now has some problems, and I also modified that in this PR.

@hummingtree Do I have to make an errorQuda here? I think the CPS phase will not cause many problems now with the modification.

weinbe2 commented 10 months ago

Just confirming, did you test this with all reconstructs (18/13/9)?

SaltyChiang commented 10 months ago

Just confirming, did you test this with all reconstructs (18/13/9)?

@weinbe2 Do you mean the reconstruction option while loading the long link? For 18 there shouldn't be any problem; for 13, I don't know the theory here but it seems the staggered phase option will not affect the result; for 9 I think only MILC/TIFR/NO phase can be selected here, do I need to implement the CPS/CHROMA situation here?

PS: QUDA_STAGGERED_PHASE_NO works with QUDA_RECONSTRUCT_9 in my example, even though I'm using QUDA_STAGGERED_PHASE_CHROMA as my convention.

weinbe2 commented 10 months ago

Thanks for putting in the changes, @SaltyChiang . I had some offline discussions and I think a good plan is to just get rid of QUDA_STAGGERED_PHASE_CPS and the code that depends on it outright, only keeping the CHROMA enum and phases that you added. This is because we aren't actually aware of anyone using CPS + staggered. Can you please take on deleting the CPS enums and phases in the code?

Once that's done, I can give this a last test.

SaltyChiang commented 10 months ago

I use projectSU3() to implement the Chroma's behavior. This will only affect interface_quda.cpp and should be safer compared with modifying the kernels.

There is some overhead though, we can ignore them as this function will not be a time-consuming one.

weinbe2 commented 10 months ago

This is a really nice, clean approach @SaltyChiang, I highly approve. I'll do some last cursory tests, but I don't forsee any huge issues with this approach.

mathiaswagner commented 10 months ago

@Jenkins test this please