Closed SaltyChiang closed 10 months ago
Jenkins: Can one of the admins verify this patch?
Jenkins: Can one of the admins verify this patch?
Hi @SaltyChiang , this is an interesting observation. I can't speak to Chroma's behavior, so I assume you are doing the correct thing.
For MILC, the function you pointed to, load_W_from_Y
, does not reach the su3
projection that you noted---instead, it shortcuts at line 358:
if(aux->WeqY)return;
Where that flag is set at https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/fermion_links_hisq_load_milc.c#L724 . The HISQ stencil used in su3_rhm[c/d]_hisq
as defined only projects to U(3) instead of SU(3). This is a nightmare to confirm, but the logic goes as:
QUARK
type is hisq_u3_action.h
, noted here: https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/ks_imp_rhmc/Make_template#L210UNITARIZE_U3
as opposed to UNITARIZE_SU3
: https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/imp_actions/hisq/hisq_u3_action.h#L54ugroup
, the flag used to assign WeqY
around line 724 noted above, here: https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/ks_action_paths_hisq.c#L175(I will acknowledge that the original HISQ paper, https://arxiv.org/pdf/hep-lat/0610092.pdf , does use an SU(3) projection for measurements---see comments near equation 30---however they note that U(3) works perfectly well. I assume the MILC devs had their reasons to not match Follana et al exactly.)
Anyway, back to the code: load_W_from_Y
is roughly a no-op, W
aliases Y
, and we see in load_Y_from_V
that there is only a projection to U(3): https://github.com/milc-qcd/milc_qcd/blob/e1a97ab6a2c440342f485ad38dc81ca9f2f7ef9c/generic_ks/fermion_links_hisq_load_milc.c#L147
To this end, we need to make sure the existing logic in QUDA is used for the MILC phase type, while your logic is used for Chroma---which itself seems to use the SU(3) projection for its own reasons. In your PR, this just means your new code just needs to be wrapped in an if (...QUDA_STAGGERED_PHASE_CPS)
appropriately.
Hi @weinbe2, thank you very much for the detailed reply!
I didn't notice the WeqY
flag and apologize sincerely for the misleading information about MILC's behavior.
Just applying the projection in the CPS phase case seems to be a good idea. But maybe I should ask if CPS is using QUDA as their HISQ inverter or not now? The modification here might also affect their code. Just adding a new type QUDA_STAGGERED_PAHSE_CHROMA
is another option, which will not affect the present logic. Which one do you prefer?
Also, do you have any comment about the implementation of QUDA_STAGGERED_PHASE_CPS
?
PS: @ybyang told me that QUDA uses -Dslash+m while Chroma uses Dslash+m in the HISQ inverter. Maybe a QUDA_STAGGERED_PAHSE_CHROMA
flag is a better choice, as we can add an extra phase to cancel the difference.
No problem @SaltyChiang , it was a good exercise in staring at the MILC code---it took me about an hour to convince myself that there hasn't been an issue in the MILC workflow (the comment in Follana et al about using U(3) instead of SU(3) added to the terror)!
Tagging @hummingtree and @maddyscientist about CPS+HISQ --- I'm not sure if it's being used in practice? I do like the idea of having a separate QUDA_STAGGERED_PHASE_CHROMA
to avoid breakage, in any case.
Is the -Dslash
vs Dslash
convention causing you any particular headaches for now? One workaround is having Chroma request the dagger
operator from QUDA. We definitely need to solve this more robustly---this is all a bit of a legacy headache---but if it's not causing any issues for you right now, I'm inclined to punt on it (but we can schedule a call to make a real plan for it in the longer term).
One thing I'll note is, if I remember correctly, both conventions give identical even-odd preconditioned operators, just different prepare/reconstructs, which might be why that bit hasn't caused you headaches (yet).
@SaltyChiang There is no QUDA usages from CPS with staggered fermion that we are aware of. If Chroma works fine with QUDA_STAGGERED_PHASE_CPS
with your changes here, it is good for now. If you like the idea of introducing QUDA_STAGGERED_PHASE_CHROMA
, I suggest add an errorQuda
when QUDA_STAGGERED_PHASE_CPS
is seen to indicate there is currently limited QUDA support for CPS with staggered.
@Jenkins test this please.
@weinbe2 You are right, the difference between -Dslash and Dslash can be easily solved by adding an overall -1 factor to the staggered phase. I added the QUDA_STAGGERED_PHASE_CHROMA
to implement the extra -1 factor, and the result looks the same as Chroma's without any pre/post process.
As @maddyscientist confirmed in #1408, the CPS phase now has some problems, and I also modified that in this PR.
@hummingtree Do I have to make an errorQuda
here? I think the CPS phase will not cause many problems now with the modification.
Just confirming, did you test this with all reconstructs (18/13/9)?
Just confirming, did you test this with all reconstructs (18/13/9)?
@weinbe2 Do you mean the reconstruction option while loading the long link? For 18 there shouldn't be any problem; for 13, I don't know the theory here but it seems the staggered phase option will not affect the result; for 9 I think only MILC/TIFR/NO phase can be selected here, do I need to implement the CPS/CHROMA situation here?
PS: QUDA_STAGGERED_PHASE_NO
works with QUDA_RECONSTRUCT_9
in my example, even though I'm using QUDA_STAGGERED_PHASE_CHROMA
as my convention.
Thanks for putting in the changes, @SaltyChiang . I had some offline discussions and I think a good plan is to just get rid of QUDA_STAGGERED_PHASE_CPS
and the code that depends on it outright, only keeping the CHROMA
enum and phases that you added. This is because we aren't actually aware of anyone using CPS + staggered. Can you please take on deleting the CPS enums and phases in the code?
Once that's done, I can give this a last test.
I use projectSU3()
to implement the Chroma's behavior. This will only affect interface_quda.cpp and should be safer compared with modifying the kernels.
There is some overhead though, we can ignore them as this function will not be a time-consuming one.
This is a really nice, clean approach @SaltyChiang, I highly approve. I'll do some last cursory tests, but I don't forsee any huge issues with this approach.
@Jenkins test this please
We need to unitarise the fat7 links in the HISQ action. In MILC and Chroma, the links are finally projected to SU(3). But QUDA doesn't perform the SU(3) projection. This makes the propagator solved by QUDA slightly different from that of Chroma.
This PR implements the extra projection. This PR is also related to #1408 and now the CPS staggered phase is the same as that in Chroma.
There are some
if
statements depending on whether an input pointer isnullptr
or not (for example,computeKSLinkQuda
). I need to useQUDA_QDP_GAUGE_ORDER
as my CPU gauge order, and acpuGaugeField
constructor withQUDA_REFERENCE_FIELD_CREATE
now fails. This PR sets all four pointers tonullptr
if the input isnullptr
.I have checked the propagator solved by QUDA with Chroma and they agree with each other.