Gaussian smearing in QUDA with the chroma interface

ybyang commented 6 years ago

Hi all,

I found that I just need use the performWuppertalnStep in the interface_quda.cpp from chroma to do the gaussian smearing I need. So the things I need to do are just passing the correct parameters to QUDA. With the present parameters, I met the following error after the first PreTune:

Wuppertal smearing done with gaugePrecise
PreTune N4quda15CopyColorSpinorIffLi4ELi3ENS_18CopyColorSpinorArgINS_11colorspinor11FloatNOrderIfLi4ELi3ELi4ELb0EEENS2_21SpaceSpinorColorOrderIfLi4ELi3EEEEEEE
Tuning N4quda15CopyColorSpinorIffLi4ELi3ENS_18CopyColorSpinorArgINS_11colorspinor11FloatNOrderIfLi4ELi3ELi4ELb0EEENS2_21SpaceSpinorColorOrderIfLi4ELi3EEEEEEE with out_stride=2048,in_stride=2048,PreserveBasis at vol=8x8x8x8
About to call tunable.apply block=(32,1,1) grid=(64,2,1) shared_bytes=0 aux=(1,1,1)
ERROR: Failed to clear error state an illegal memory access was encountered
 (rank 0, host nid02349, /ccs/home/ybyang1/src/quda/lib/tune.cpp:619 in tuneLaunch())

I tried 8^4 and 24^3x64 lattice so it should not be a problem about running out the memory.

Any suggestion?

Yi-Bo

PS: What I used in Chroma are listed below:

// set q_gauge_param
           q_gauge_param = newQudaGaugeParam();  

           const multi1d<int>& latdims = Layout::subgridLattSize();
           q_gauge_param.X[0] = latdims[0];   
           q_gauge_param.X[1] = latdims[1];
           q_gauge_param.X[2] = latdims[2];
           q_gauge_param.X[3] = latdims[3];   
           q_gauge_param.type = QUDA_WILSON_LINKS;
           q_gauge_param.gauge_order = QUDA_QDP_GAUGE_ORDER;
           q_gauge_param.t_boundary = QUDA_PERIODIC_T;//smearing is independent to BC
           q_gauge_param.cpu_prec = cpu_prec;
           q_gauge_param.cuda_prec = gpu_prec;

           switch( params.cudaReconstruct ) {
           case RECONS_NONE:
             q_gauge_param.reconstruct = QUDA_RECONSTRUCT_NO;
             break;
           case RECONS_8:
             q_gauge_param.reconstruct = QUDA_RECONSTRUCT_8;
             break;
           case RECONS_12:
             q_gauge_param.reconstruct = QUDA_RECONSTRUCT_12;
             break;
           default:
             q_gauge_param.reconstruct = QUDA_RECONSTRUCT_12;
             break;
           };
           q_gauge_param.cuda_prec_sloppy = gpu_prec;
           q_gauge_param.reconstruct_sloppy = QUDA_RECONSTRUCT_12;

           q_gauge_param.gauge_fix = QUDA_GAUGE_FIXED_NO;
           q_gauge_param.anisotropy = 1.0;//smearing don't care this;

        // etup padding
           multi1d<int> face_size(4);
           face_size[0] = latdims[1]*latdims[2]*latdims[3]/2;
           face_size[1] = latdims[0]*latdims[2]*latdims[3]/2;
           face_size[2] = latdims[0]*latdims[1]*latdims[3]/2;
           face_size[3] = latdims[0]*latdims[1]*latdims[2]/2;

           int max_face = face_size[0];
           for(int i=1; i <=3; i++) {
             if ( face_size[i] > max_face ) {
               max_face = face_size[i];
             }
           }

           q_gauge_param.ga_pad = max_face;
           q_gauge_param.cuda_prec_precondition = gpu_prec;
           q_gauge_param.reconstruct_precondition = QUDA_RECONSTRUCT_12;

// load gauge;
           void* gauge[4];
           for(int mu=0; mu < Nd; mu++) {
#ifndef BUILD_QUDA_DEVIFACE_GAUGE
               gauge[mu] = (void *)&(links_single[mu].elem(all.start()).elem().elem(0,0).real());
#else
               gauge[mu] = QDPCache::Instance().getDevicePtr( links_single[mu].getId() );
               QDPIO::cout << "MDAGM CUDA gauge[" << mu << "] in = " << gauge[mu] << "\n";
#endif
           }
           QDPIO::cout << "smearing on GPU" << std::endl;
           QDPIO::cout << q_gauge_param.cpu_prec << " " << q_gauge_param.cuda_prec
             << " " << q_gauge_param.cuda_prec_sloppy 
             << " " << q_gauge_param.cuda_prec_precondition << "\n";
           loadGaugeQuda((void *)gauge, &q_gauge_param);

// set quda_inv_param needed by performWuppertalnStep

           quda_inv_param = newQudaInvertParam();
           quda_inv_param.cpu_prec = cpu_prec;
           quda_inv_param.cuda_prec = gpu_prec;
           quda_inv_param.cuda_prec_sloppy = gpu_prec;   
           quda_inv_param.preserve_source = QUDA_PRESERVE_SOURCE_NO; 
           quda_inv_param.dirac_order = QUDA_DIRAC_ORDER;
           quda_inv_param.input_location = QUDA_CUDA_FIELD_LOCATION;
           quda_inv_param.output_location = QUDA_CUDA_FIELD_LOCATION;
           if( params.tuneDslashP ) {
             QDPIO::cout << "Enabling Dslash Autotuning" << std::endl;

             quda_inv_param.tune = QUDA_TUNE_YES;
           }
           else {
             QDPIO::cout << "Disabling Dslash Autotuning" << std::endl;

             quda_inv_param.tune = QUDA_TUNE_NO;
           }  
           quda_inv_param.sp_pad = 0;
           quda_inv_param.cl_pad = 0;
           quda_inv_param.cuda_prec_precondition = QUDA_SINGLE_PRECISION;
           quda_inv_param.clover_cuda_prec_precondition = QUDA_SINGLE_PRECISION;
           if( params.verboseP ) {   
             quda_inv_param.verbosity = QUDA_VERBOSE;
           }
           else {
             quda_inv_param.verbosity = QUDA_SUMMARIZE;  
           }
           quda_inv_param.gamma_basis = QUDA_UKQCD_GAMMA_BASIS;

// setup the LatticeFermion to be passed to QUDA. I am not sure about what the “rb” is, just copied from the chroma interface of quda inverter.
           LatticeFermion mod_chi,psi_s;
           mod_chi[rb[0]] = zero;
           mod_chi[rb[1]] = quark;
#ifndef BUILD_QUDA_DEVIFACE_SPINOR           
           void* spinorIn =(void *)&(mod_chi.elem(rb[1].start()).elem(0).elem(0).real());
           void* spinorOut =(void *)&(psi_s.elem(rb[1].start()).elem(0).elem(0).real());
#else
           void* spinorIn = GetMemoryPtr( mod_chi.getId() );
           void* spinorOut = GetMemoryPtr( psi_s.getId() );
#endif    

//  setup alpha of the Wuppertal smearing from gaussian smearing kappa;
           double wvf_param=params.wvf_param.elem().elem().elem().elem();
           double ftmp=-(wvf_param*wvf_param)/(4.0*params.wvfIntPar);
           double alpha=-ftmp/(1+6*ftmp);

// do the smearing.
           performWuppertalnStep(spinorOut,spinorIn,&quda_inv_param,params.wvfIntPar,alpha);

maddyscientist commented 6 years ago

Hi @ybyang

We should be able to make this work. Looking at the interface code I suspect the issue is centered around the use of checker boarding. Do you need apply the Wuppertal smearing to a full fermion field, or to a single parity field.

The rb refers to red black, eg selects which parity of the field. By default chroma only passes the black parity to QUDA, eg the second half of the field whereas we need Ron make the interface is setup for copying an entire field into QUDA.

ybyang commented 6 years ago

Hi @maddyscientist

 I need to apply the Wuppertal smearing to a full fermion field. I tried to malloc the memory with the size V*12*8 (both spinorIn and spinorOut) in Chroma and then call the performWuppertalnStep, but it doesn’t work either. So it should be the problem of the inv_param or gauge_param I set?

Regards,

Yi-Bo

On Feb 23, 2018, at 8:47 AM, maddyscientist notifications@github.com wrote:

Hi @ybyang https://github.com/ybyang We should be able to make this work. Looking at the interface code I suspect the issue is centered around the use of checker boarding. Do you need apply the Wuppertal smearing to a full fermion field, or to a single parity field.

The rb`

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/681#issuecomment-367873005, or mute the thread https://github.com/notifications/unsubscribe-auth/AES9KB8jRBsuFUjp2n9xGQzWL5ko2cTiks5tXgqqgaJpZM4SOawg.

ybyang commented 6 years ago

Hi @maddyscientist,

 I tried the following setup for the inv_param, but the problem is still the same. I think it should be an acceptable choice for me if we have to do the smearing for 4 Dirac indices individually..

       quda_inv_param.dslash_type = QUDA_LAPLACE_DSLASH;
       quda_inv_param.gamma_basis = QUDA_DEGRAND_ROSSI_GAMMA_BASIS;
       quda_inv_param.Ls = 1;
       quda_inv_param.solution_type = QUDA_MAT_SOLUTION ;
       quda_inv_param.solve_type = QUDA_NORMOP_PC_SOLVE;

Regards,

Yi-Bo

On Feb 23, 2018, at 9:01 AM, yibo.yang yangyibo@pa.msu.edu wrote:

Hi @maddyscientist
 I need to apply the Wuppertal smearing to a full fermion field. I tried to malloc the memory with the size V*12*8 (both spinorIn and spinorOut) in Chroma and then call the performWuppertalnStep, but it doesn’t work either. So it should be the problem of the inv_param or gauge_param I set?
Regards,

Yi-Bo

On Feb 23, 2018, at 8:47 AM, maddyscientist <notifications@github.com mailto:notifications@github.com> wrote:

Hi @ybyang https://github.com/ybyang We should be able to make this work. Looking at the interface code I suspect the issue is centered around the use of checker boarding. Do you need apply the Wuppertal smearing to a full fermion field, or to a single parity field.

The rb`

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/681#issuecomment-367873005, or mute the thread https://github.com/notifications/unsubscribe-auth/AES9KB8jRBsuFUjp2n9xGQzWL5ko2cTiks5tXgqqgaJpZM4SOawg.

maddyscientist commented 6 years ago

Hi Yi-Bo,

Sorry for the slow response time on this.

I think I may need to support to the Chroma-QUDA interface to allow for exchange of full quark fields as opposed to single parity fields. Perhaps the best way to proceed is for you to send me your code and I can work on this directly. Can you send me your modified chroma code that calls the QUDA Wuppertal smearing, together with instructions for testing and I will work on this directly.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

ybyang commented 6 years ago

Hi,

   I found that the attached reply didn’t appear in the issue 681 on the github website. I guess I should reply the address @reply.github, not the others?

   Besides, if I want to try the QUDA Wuppertal smearing with quda, should I switch to some special branch? It seems that no one called the function wuppertalStep or performWuppertalnStep from the tests, even though the staggered_invert_test support the dslash-type laplace.

Thanks,

Yi-Bo

On Mar 23, 2018, at 9:42 AM, yibo.yang yangyibo@pa.msu.edu wrote:

Hi @maddyscientist,
 I tried to pull the newest version of QUDA but the problem is still the same:
———— ERROR: Failed to clear error state an illegal memory access was encountered (rank 0, host nid06592, /ccs/home/ybyang1/src/quda/lib/tune.cpp:619 in tuneLaunch()) last kernel called was (name=N4quda15CopyColorSpinorIffLi1ELi3ENS_18CopyColorSpinorArgINS_11colorspinor11FloatNOrderIfLi1ELi3ELi2ELb0EEENS2_21SpaceSpinorColorOrderIfLi1ELi3EEEEEEE,volume=8x8x8x8,aux=out_stride=2048,in_stride=2048) Saving 5 sets of cached parameters to /lustre/atlas1/nph122/scratch/ybyang1/QUDA_RESOURCE/tunecache_error.tsv ————

I used a 8^4 lattice so the memory should be enough. Should I send the code to the email address mclark@nvidia.com mailto:mclark@nvidia.com directly?

Thanks,

Yi-Bo

On Mar 22, 2018, at 8:41 PM, maddyscientist <notifications@github.com mailto:notifications@github.com> wrote:

Hi Yi-Bo,

Sorry for the slow response time on this.

I think I may need to support to the Chroma-QUDA interface to allow for exchange of full quark fields as opposed to single parity fields. Perhaps the best way to proceed is for you to send me your code and I can work on this directly. Can you send me your modified chroma code that calls the QUDA Wuppertal smearing, together with instructions for testing and I will work on this directly.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/681#issuecomment-375502891, or mute the thread https://github.com/notifications/unsubscribe-auth/AES9KN3V08KhVnAvZ24QAG69aE8tbO08ks5thETUgaJpZM4SOawg.

maddyscientist commented 5 years ago

@cpviolator this is probably relevant for your work

ybyang commented 4 years ago

Hi, @maddyscientist @cpviolator

Ok, I figured out what the problem is.

First, the quda_inv_param.input_location and quda_inv_param.output_location set from chroma should be QUDA_CPU_FIELD_LOCATION, otherwise the function performWuppertalnStep will not copy the input/output data from/to CPU correctly. Such a feature is different from that used in the inverter, and then confused me.

Second, there are two bugs in the function performWuppertalnStep:

profileWuppertal should not be started as the dslash kernel will start it.
comm_override in ApplyLaplace should not be nullptr as the dslash kernel will need it.

The following hack works fine and the result agrees with that on CPU:

    int comm_dim[4] = {};
    // only switch on comms needed for directions with a derivative
    for (int i = 0; i < 4; i++) {
      comm_dim[i] = comm_dim_partitioned(i);
      if (i == 3) comm_dim[i] = 0;
    }
  for (unsigned int i=0; i<nSteps; i++) {
    if (i) in = out;
    ApplyLaplace(out, in, *precise, 3, a, b, in, parity, false, comm_dim, profileWuppertal);
//    ApplyLaplace(out, in, *precise, 3, a, b, in, parity, false, nullptr, profileWuppertal);

Besides, use quda_inv_param.dslash_type = QUDA_WILSON_DSLASH is fine now.

maddyscientist commented 4 years ago

Great to hear this makes it work ok 😄

@ybyang reopening this bug, since the bug isn't actually fixed in mainline QUDA yet. Can you file a pull request against the develop branch with the fix please? Thx.

maddyscientist commented 1 year ago

Closed by #1381

lattice / quda

Gaussian smearing in QUDA with the chroma interface #681