Closed pursure-D closed 3 months ago
LASER can reduce memory footprint but the current code doesn't do that. Here is how it works. Say you have a W
matrix of size m x n
and we do its rank r-approximation. Then we can write r-rank approximation of W
as U S V
where U
is of size m x r
, S
is a matrix of size r x r
, and V
is of size r x n
. These 3 matrices are of size r (m + r + n)
which can be much smaller than mn
when r << min{m, n}
. E.g., if r=10
, and say typical values of m = n = 4096
, then we have 10 (4096 + 10 + 4096)
values instead of 4096 x 4096
which is a reduction of 0.005
times in memory footprint.
The current code, multiplies U S V
back to get a m x n
low-rank approximation that makes the matrix back to the original size. To save the memory, you need to keep these 3 matrices separate and then multiply them as (U S V)x
in succession. For fast processing, you can absorb the S
matrix into either US
or SV
which still keeps the memory advantage and reduces the number of sequential matrix computations from 3 to 2.
Adding this is a long-requested feature but I am sorry that I have not had found time to add this. That said, it should be fairly easy to modify the code to do this. I am happy to help answer further questions.
Duplicate of #6
When can this feature be supported? It seems to have been mentioned for addition back in January this year, but it doesn't seem to be available now。 I've noticed that you indeed set the weights after restructuring as learnable parameters and use them for updates, which results in the final stored weights not saving memory. If you have time, could you please support this simple feature as soon as possible? Thank you very much.
Hi @pursure-D . I do apologize for the delay. We had change in priorities on our end which delayed further development. I can definitely add that feature this week over the July 4th break. Please check back by Monday of next week.
Marking as active.
Another question is, can the saved weights support LoRA fine-tuning?