lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

FIX: Adding allocation for evals in MG when deflation is preserved #1377

Closed pittlerf closed 8 months ago

pittlerf commented 1 year ago

In the current develop version I obtained the following error when updating MG with preserve_deflation.

MG level 1 (GPU): ERROR: Requesting 256 eigenvalues with only storage allocated for 0 (rank 0, host nid007567, eigensolve_quda.cpp:582 in void quda::EigenSolver::computeEvals(std::vector<ColorSpinorField> &, std::vector<Complex> &, int)())

Here I provide a small fix. Thank you,

mathiaswagner commented 1 year ago

Jenkins: Can one of the admins verify this patch?

mathiaswagner commented 1 year ago

Jenkins: Can one of the admins verify this patch?

maddyscientist commented 1 year ago

@Jenkins ok to test

maddyscientist commented 1 year ago

@pittlerf thanks for the contribution. Please feel free add your name to the contributor list on the README 😄

maddyscientist commented 1 year ago

@Jenkins ok to test

kostrzewa commented 1 year ago

This does not seem to be enough to make it work for me. I need to additionally resize evals before or when EigenSolver::computeEvals is called.

kostrzewa commented 1 year ago

As a temporary workaround I do the following:

584   void EigenSolver::computeEvals(std::vector<ColorSpinorField> &evecs,
585                                  std::vector<Complex> &evals, int size)
586   {
587     if (size > (int)evecs.size())
588       errorQuda("Requesting %d eigenvectors with only storage allocated for %lu", size, evecs.size());
589     if (size > (int)evals.size())
590       evals.resize(size);                                                                                                              
591       //errorQuda("Requesting %d eigenvalues with only storage allocated for %lu", size, evals.size());
weinbe2 commented 1 year ago

@kostrzewa @pittlerf trying to close the loop on this---I know the PR's been stagnant a bit, can one of you please merge in develop (which should "just work", it seems), add that change in computeEvals, and do a last test with your workflow (since I don't know what exactly it is)?

I can take care of cosmetic things like updating contributors + clang-format if that helps, I just can't confirm that the fixes here still work for you.

kostrzewa commented 11 months ago

@weinbe2 sorry, I completely lost sight of this for a number of reasons, have it on my TODO list now and will get back to you

pittlerf commented 11 months ago

@kostrzewa @pittlerf trying to close the loop on this---I know the PR's been stagnant a bit, can one of you please merge in develop (which should "just work", it seems), add that change in computeEvals, and do a last test with your workflow (since I don't know what exactly it is)?

I can take care of cosmetic things like updating contributors + clang-format if that helps, I just can't confirm that the fixes here still work for you.

Hi, yes, sorry I also lost sight of this, however with @kostrzewa additions still works for our framework.

kostrzewa commented 11 months ago

@weinbe2 I can confirm that this still works correctly with tmLQCD after the merge with develop. I've removed the commented-out error message and added a comment to explain why the evals.resize() is necessary. Not sure if the latter is really helpful.

Should I squash these into a single commit and force-push?

maddyscientist commented 10 months ago

No need to force-push I think, we can I assume just do a "Squash and Merge"?

I've merged in latest develop, which should allow the CI matrix to complete. Once that shows as green, we can just hit the button I think.

maddyscientist commented 8 months ago

@Jenkins test this please