Open wlai0611 opened 1 month ago
We have failed to reproduce this on an Ampere and a Volta card. I managed to get my hands on a GeForce GTX 980 (which is Maxwell) and was just able to reproduce this, so this issue appears to be architecture-specific. More investigation is necessary.
Technically speaking we only support Volta+ (because that's we test in CI), but we don't outright refuse to run under older architectures, because for the most part everything is expected to work. We know that some of our kernels require independent thread scheduling (that was introduced with Volta), but that shouldn't cause silent data corruption...
We confirmed that the bug does not reproduce on the same hardware when using latest top-of-tree, so at some point between 24.06 and today the underlying issue was fixed. We plan to push a new top-of-tree build within the next two weeks (currently finalizing another patch release, so it will come after that). We will notify you at that point to try out the fix.
I was using cunumeric to get the singular vectors and values for a 4900 row by 100 column matrix in which each column is a flattened 70 by 70 image of 3 particles approaching each other (attached here lj.csv ).
After I obtain the singular vectors and values, I attempt reconstruct the 4900 x 100 matrix using cunumeric's dot and multiply functions but the resulting matrix product is a zero matrix whereas using Numpy's dot and multiply results in a nonzero matrix.
The process I describe above is coded below:
I run the py file with
My outputs:
My hardware specs are below:
Thank you!