support for 4bit quantization from transfomer library.

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Apache License 2.0

36.5k stars 4.5k forks source link

support for 4bit quantization from transfomer library. #1798

Open harpomaxx opened 1 year ago

harpomaxx commented 1 year ago

Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?

cidtrips commented 1 year ago

Honestly, it's updating to transformers 4.30, adding one other dependency package, and about 8 changes in the code if I recall correctly. Plus it works with multi-gpus.

Unfortunately I lost my changes from my running copy when I updated for the API updates, but I think most of the work is already done in my fork.

merrymercy commented 1 year ago

Contributions are welcome

02shanks commented 1 month ago

@merrymercy is this issue still open for contribution?

surak commented 1 month ago

@02shanks absolutely!!!!

02shanks commented 1 month ago

@surak as this is my first code contribution, could you please guide me through the process? Where should I start?

surak commented 1 month ago

Well, the usual:

fork the repo,
branch it into a relevant name,
and contribute ONLY those changes related to the issue.
keep the repo up-to-date with the main branch, as this makes for an easier merge
comment it where applicable
once it's good enough, do a merge request. We will look into it and people will review it.

Nothing special, really!

02shanks commented 1 month ago

@surak @merrymercy I have just created the PR. Can you please review it?