hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 80 forks source link

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

Open idoru opened 1 year ago

idoru commented 1 year ago

Also supports loading PEFT LoRA adapters with MODEL_PEFT=true. For detail on 4bit quantization options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Implements #202

LoopControl commented 1 year ago

It might be better to split the QLora stuff from the Peft Lora adapter support.

Qlora/4bit requires latest/git-master version of transformers, accelerate, and such (and I don't see that listed in the requirements.txt on this PR).

Lora-adapter support should be possible without bleeding edge versions of transformers though so that'd be great to get merged in first.

idoru commented 1 year ago

Thanks for the review! I'm very new to working on Python codebases, so haven't fully got the hang of the dependency management workflows and gotchas. I'll split them as you suggested, and fix the requirements.

peakji commented 1 year ago

Huggingface finally released QLoRa-supported versions of transformers and accelerate, which allows us to add basic 4-bit quantization support in https://github.com/hyperonym/basaran/pull/209.

Maybe you can simplify this PR to include only PEFT stuffs? Of course it would also be easier if you want to add more detailed options for 4-bit quantization, as dependencies are no longer an issue.

idoru commented 1 year ago

Hi, thanks for the feedback. I've updated the PR now. Tested with my very amateur QLoRA model with the following:

MODEL_TRUST_REMOTE_CODE=true \
MODEL_LOAD_IN_4BIT=true \
MODEL_4BIT_QUANT_TYPE=nf4 \
MODEL_4BIT_DOUBLE_QUANT=true \
MODEL_PEFT=true \
MODEL=idoru/falcon-40b-nf4dq-chat-oasst1-2epoch-v2 \
PORT=8080 \
python -m basaran
codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 36.84% and project coverage change: -2.61 :warning:

Comparison is base (1677491) 94.29% compared to head (33a37c1) 91.69%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #203 +/- ## ========================================== - Coverage 94.29% 91.69% -2.61% ========================================== Files 7 7 Lines 333 349 +16 ========================================== + Hits 314 320 +6 - Misses 19 29 +10 ``` | [Impacted Files](https://app.codecov.io/gh/hyperonym/basaran/pull/203?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hyperonym) | Coverage Δ | | |---|---|---| | [basaran/model.py](https://app.codecov.io/gh/hyperonym/basaran/pull/203?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hyperonym#diff-YmFzYXJhbi9tb2RlbC5weQ==) | `83.52% <25.00%> (-5.01%)` | :arrow_down: | | [basaran/\_\_init\_\_.py](https://app.codecov.io/gh/hyperonym/basaran/pull/203?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hyperonym#diff-YmFzYXJhbi9fX2luaXRfXy5weQ==) | `96.87% <100.00%> (+0.32%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.