Open turboderp opened 10 months ago
We dont release them but one can recover them by taking the standard huggingface models and applying the right LASER intervention. The list of optimal intervention is in Table 3 of the paper https://arxiv.org/pdf/2312.13558.pdf.
I suppose it can be a good feature to add to get the LLM model with chosen hyperparameters for people to reproduce our results. I am making a list of features for the upcoming refactoring. I'll add this to it.
Let me know if you have more questions.
Related to #9
How much memory requirement can we reduce if we store weights rank-reduced format?
If you have a mxn matrix then your memory is mn. If you reduce the rank down to k using SVD, then you will store mk + k^2 + kn. Imagine k=1, then this comes down to just m + n + 1. If you reduce it to 1% of the max rank (and say m > n and so max rank is n), then you have k=n/100 and you get mn/100 + (n/100)^2 + n^2/100 <= mn/50 + mn/(10000) ~ mn/50, so about 50 times shrinkage.
Do you publish the rank-reduced models anywhere?