lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

multiple calls to loadGaugeQuda in minvcg branch #6

Closed bjoo closed 13 years ago

bjoo commented 13 years ago

multiple calls to loadGaugeQuda produce

in the minvcg branch, when not using mixed precision, in multi-GPU mode, multiple calls to loadGaugeQuda can elicit the error:

QUDA error: (CUDA) invalid argument (node 0, gauge_quda.cpp:805)

Background: We fixed issue 5 (https://github.com/lattice/quda/issues/#issue/5 ) in the minvcg branch by adding freeGaugeQuda and freeCloverQuda calls, that can be called at the end of a solver so that a subsequent call to loadGaugeQuda can happily re-allocate the gauge. However a new issue has arisen: in uniform precision gauge and gaugeSloppy are actually pointers to the same place. Somehow or other after multiple calls to loadGaugeQuda one can encounter the above error. This is pernicious in an HMC like situation when multiple calls to loadGaugeQuda are necessary as the gauge field evolves.

An additional data point: when using a mixed precision solver (eg precision=SINGLE, sloppy precision=HALF) , this situation does not arise, which makes me suspect that the underlying cause of this bug is the aliasing of gauge to gaugeSloppy in uniform precision.

Reproducing: configure the minvcg branch with

./configure --enable-os=linux --enable-gpu-arch=sm_20 --disable-staggered-dirac \ --enable-wilson-dirac --disable-domain-wall-dirac --disable-twisted-mass-dirac \ --enable-multi-gpu --with- qmp=/home/bjoo/Devel/QCD/install/qmp/qmp2-1-6/openmpi \ --with-mpi=/home/bjoo/Toolchain/install/openmpi-1.5

Then link chroma against this and run the t_leapfrog test with a QUDA solver in the MD using uniform precision.

NB: producing this error so far required an external client to make multiple calls to loadGaugeQuda (eg. chroma calling loadGaugeQuda during the MD evolution in HMC) A small self contained test within QUDA reproducing this error (without chroma) would be desirable.

gshi commented 13 years ago

Balint,

in freeCloverField(), freeParityClover(&clover->even); freeParityClover(&clover->odd); the even/odd are not tested if they are null, nor are they set to NULL after they are freed. In that case, they could be freed twice if precision is uniform. Maybe you want to do something similar to freeGaugeField()

void freeGaugeField(FullGauge *cudaGauge) { if (cudaGauge->even) cudaFree(cudaGauge->even); if (cudaGauge->odd) cudaFree(cudaGauge->odd); cudaGauge->even = NULL; cudaGauge->odd = NULL; }

I do not have qmp installed in my machine so I cannot test it but I suspect this is the reason causing your error.

bjoo commented 13 years ago

I have tracked this. The source of the bug was that in uniform precision,

cudaGaugePrecise was assigned to cudaGaugeSloppy

via an assignment of the form

sloppy = precise; (e.g. at the end of loadGaugeQuda() )

This assigns two structs, including the pointers inside them.

So for example after the above call we have that sloppy->gauge == precise->gauge

When we free precise->gauge and set precise->gauge = NULL; the corresponding operations do not happen to sloppy->gauge.

Consequently sloppy->gauge is NOT NULL at the end of a free to freeGaugeField(&precise);

So subsequently calling freeGaugeField(&sloppy) will result in a double free even if freeGaugeField checks that its not freeing a NULL pointer. This is because the pointer in sloppy is not NULL. This can result in undefined behaviour.

The tests needed for correct freeing in pseudocode are:

if( sloppy->gauge == precise->gauge ) { freeGaugeField(precise); sloppy_gauge = NULL; } else { freeGaugeField(precise); freeGaugeField(sloppy); }

I have implemented this in freeGaugeQuda() and freeCloverQuda() in the minvcg branch. One suggestion is reference counting on spaces on the device, using reference counted smart pointers. In that case the line:

sloppy = precise

will automagically increas a reference count on the memory pointed to by precise. The first free (on precise) would have reduced the reference count (but not freed the memory), tho the pointer in precise would have been made to point to NULL. The second free ( on sloppy ) would have reduced reference count to 0, freed the buffer and set sloppy's pointer to NULL.

This fix also fixes issue 5.