jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

RuntimeError: cusolver error: CUSOLVER_STATUS_INVALID_VALUE in torch.linalg.svd #7

Closed samuelwheeler closed 3 months ago

samuelwheeler commented 3 months ago

The method works great on most layers but on the final projection in my transformer (1024 x 50k) I get

RuntimeError: cusolver error: CUSOLVER_STATUS_INVALID_VALUE, when calling `cusolverDnSgesvdj_bufferSize(handle, jobz, econ, m, n, A, lda, S, U, ldu, V, ldv, lwork, params)`

when executing U, s, Vh = torch.linalg.svd(matrix).

The issue is fixed by using U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)

jiaweizzhao commented 3 months ago

Thanks for the fix!