Open detar opened 8 years ago
Under what circumstances should this appear in a normal run?
I would suggest we simply return 0 and give a warning then?
Hi Kate,
When we run the spectrum analysis code, we often choose sources with support on only the even sites of a single time slice. The MILC code calls for separate multimass inversions on the even and odd sites. The odd site solve then has a zero rhs. Of course, calling QUDA with such a setup is inefficient, because all the data movement is for naught. The alternative is to compute the source norm in the MILC QUDA wrapper, which is also inefficient, because such cases never occur when we are running HMC, so we shouldn't want to be wasting time checking for rare events.
We have not encountered this problem before, because we have not been running such projects on GPUs until now.
Best, Carleton
On 11/25/2015 1:57 PM, maddyscientist wrote:
Under what circumstances should this appear in a normal run?
I would suggest we simply return 0 and give a warning then?
— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/400#issuecomment-159729981.
Carleton DeTar Department of Physics and Astronomy University of Utah
So in QUDA we could check for zero before copying to the GPU to reduce the unnecessary overhead. We can get this into the develop branch in the next few days.
Hi Kate,
Don't make this urgent. I have already hacked the MILC wrapper for now to prevent this, so my jobs can run.
Thanks, Carleton
On 11/25/2015 2:26 PM, maddyscientist wrote:
So in QUDA we could check for zero before copying to the GPU to reduce the unnecessary overhead. We can get this into the develop branch in the next few days.
— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/400#issuecomment-159734868.
Carleton DeTar Department of Physics and Astronomy University of Utah
Close by #633
Not sure what #633 has to do with this issue? #633 was a fix to a bug I inadvertently introduced a few months ago, where a tolerance of 0 resulted in the solver never existing until the limit of precision is reached.
Argh, Right. I should get a coffee. I reopened the issue.
Aborting is wrong. It should simply return a zero solution.