Closed ghost closed 1 year ago
We used the original implementation: https://github.com/artidoro/qlora, and rewrote the forward similar to what is in the README.md right after this line: https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L705.
A note on the QLORA repository. When using it initially I had issues setting up the dependencies. I found that I needed to download cudatoolkit package and manually rewrite the cuda cpu file in bitsandbytes to the corresponding gpu one. Very similar to these issues: https://github.com/search?q=repo%3ATimDettmers%2Fbitsandbytes+CUDA+Setup+failed+despite+GPU+being+available.&type=issues&p=2.
Let me know if you have any additional problems.
Hi there, congrats on this interesting work!
On the paper you share results for QLoRA training, yielding good results. This seems to be omitted in the code implementation so, where should one begin to experiment training adapters with this technique?