Open idoru opened 1 year ago
It might be better to split the QLora stuff from the Peft Lora adapter support.
Qlora/4bit requires latest/git-master version of transformers, accelerate, and such (and I don't see that listed in the requirements.txt
on this PR).
Lora-adapter support should be possible without bleeding edge versions of transformers though so that'd be great to get merged in first.
Thanks for the review! I'm very new to working on Python codebases, so haven't fully got the hang of the dependency management workflows and gotchas. I'll split them as you suggested, and fix the requirements.
Huggingface finally released QLoRa-supported versions of transformers and accelerate, which allows us to add basic 4-bit quantization support in https://github.com/hyperonym/basaran/pull/209.
Maybe you can simplify this PR to include only PEFT stuffs? Of course it would also be easier if you want to add more detailed options for 4-bit quantization, as dependencies are no longer an issue.
Hi, thanks for the feedback. I've updated the PR now. Tested with my very amateur QLoRA model with the following:
MODEL_TRUST_REMOTE_CODE=true \
MODEL_LOAD_IN_4BIT=true \
MODEL_4BIT_QUANT_TYPE=nf4 \
MODEL_4BIT_DOUBLE_QUANT=true \
MODEL_PEFT=true \
MODEL=idoru/falcon-40b-nf4dq-chat-oasst1-2epoch-v2 \
PORT=8080 \
python -m basaran
Patch coverage: 36.84
% and project coverage change: -2.61
:warning:
Comparison is base (
1677491
) 94.29% compared to head (33a37c1
) 91.69%.
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Also supports loading PEFT LoRA adapters with
MODEL_PEFT=true
. For detail on 4bit quantization options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytesImplements #202