jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

CUDA out of memory in torch.linalg.svd #4

Closed threewayhandshake closed 4 months ago

threewayhandshake commented 4 months ago

I tried to use GaLore on nn.Linear(256, 267736). Then I got the following error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 267.04 GiB. at U, s, Vh = torch.linalg.svd(matrix). I think full_matrices=False may be required at torch.linalg.svd.