lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.67k stars 4.52k forks source link

Any plans for 30B version? #170

Closed davidliudev closed 1 year ago

davidliudev commented 1 year ago

Any plan for fine-tuning a 30B version? Either directly fine-tuning it or with Lora?

zhisbug commented 1 year ago

yes, @ZYHowell is looking into this. But we need to first investigate if 30B w/ lora can improve the chatbot performance compared to 13B w/o lora, otherwise, it does not make sense to spend compute to train a worse bot.

davidliudev commented 1 year ago

I guess we should see improvement. @zhisbug

Here is what I get from alpaca 30b with lora (4bit)

image

And this is what I get currently using Vicuna 13B 4bit image

Apparently the 30b can do so much better with logical deductions

Both are 1 shot answer

Originalimoc commented 1 year ago

@davidliudev is this MetaIX one aka Chansung's 30B Alpaca Lora + GPTQ? @zhisbug I just read from where that comparing (30B4b vs 13B16b?) that you almost should go higher parameter count than lower parameter count with higher precision with same RAM requirement.

Originalimoc commented 1 year ago

I guess should direct fine-tune LLaMA to get base model. Then test to see if LoRa can further improve things. Then GPTQ. ML magic╰(°▽°)╯

davidliudev commented 1 year ago

@Originalimoc No I used this one https://huggingface.co/elinas/alpaca-30b-lora-int4 which is based on this lora https://huggingface.co/baseten/alpaca-30b Loaded via Text Generation WebUI and GPTQ

ehartford commented 1 year ago

Can I please have the training data so I can finetune my own vicuna 30B and 65B

Boehmer86 commented 1 year ago

Can I please have the training data so I can finetune my own vicuna 30B and 65B

Second this, we would like to try the same :-)

Thireus commented 1 year ago

@davidliudev, I just tried https://huggingface.co/elinas/alpaca-30b-lora-int4 with alpaca-30b-4bit-128g.safetensors

This is what I get:

potato

And similar hallucinations with vicuna.

Could you share your parameters please?

Edit: Unchecking "do_sample" helps get rid of the hallucinations, but alpaca-30b-4bit-128g is still not good at logic:

potato_good

gundy commented 1 year ago

And this is what I get currently using Vicuna 13B 4bit

I'm trying Vicuna 13b with 8-bit quantization, and got the answers below using the FastChat CLI, making me think that the Vicuna 13b model seems to be able to handle some deductive reasoning too. The answers I get seem a lot more verbose than others posted above:

Human: If I have a basket of potatoes and a box of cake on top of it, how do I get 1 potato?

Assistant: If you have a basket of potatoes and a box of cake on top of it, and you want to get one potato, you can follow these steps:

  1. First, carefully remove the cake from the basket and place it on a surface nearby.
  2. Then, reach into the basket and carefully remove one potato, being mindful not to disturb the other potatoes in the basket.
  3. If you want to make sure you get the potato you want, you can take the time to identify it before you remove it from the basket.
  4. Once you have the potato in your hand, you can either return the other potatoes to the basket or place them somewhere else where they will not be disturbed.
  5. Finally, you can either eat the potato, store it for later use, or use it in a recipe.

... and then the follow-up question asking for rationale behind it's sequencing:

Human: Why do I need to remove the cake first?

Assistant: You don't necessarily need to remove the cake from the basket before getting a potato, but it may be necessary if you want to get a specific potato or if you want to avoid damaging the potatoes or the cake. Here are a few reasons why you might want to remove the cake before getting a potato:

  1. If you want to get a specific potato: If you know which potato you want to get, you might want to remove any other potatoes or objects that are in the way so that you can easily reach the one you want.
  2. To avoid damaging the potatoes or the cake: If you are not careful when reaching into the basket, you could accidentally damage the potatoes or the cake. By removing the cake first, you can avoid accidentally squishing it or dislodging any potatoes while you are trying to get the one you want.
  3. To keep the basket organized: If you want to keep the basket organized, you might want to remove any objects that are not potatoes before getting the one you want. This can help you to easily find the potato you want and prevent any confusion or frustration.
CRD716 commented 1 year ago

Watching this in case any developments regarding fine tuning 65B arise...

mdawid commented 1 year ago

@davidliudev, I think it's still possible to do logical reasoning with the current model size. This is what I get with 4-bit model from here: https://huggingface.co/eachadea/ggml-vicuna-13b-4bit using llama.cpp.

> If I have a basket of potatoes and a box of cake on top of it, how do I get 1 potato?\

To get one potato from the basket with a box of cake on top, you would need to remove the box of cake first. This can be done by carefully lifting the box off the potatoes using your hands or a tool, such as a spatula or a fork. Once the box is removed, you can then access the potatoes and remove one from the basket.
### Human:
> Why can't I get one directly?
It is not possible to get one potato directly from the basket with a box of cake on top without first removing the box of cake. This is because the box is blocking access to the potatoes, and you would need to move or remove the box in order to reach the potatoes. It is not safe or feasible to try to get one potato directly from the basket while it is still covered by the box.
davidliudev commented 1 year ago

Thanks for inputs from other testings. I am not using scientific method to do the comparison so expect this to be a pretty rough test and I may be wrong. Maybe someone can do a more scientific test, or we can just wait for a true 30b version and see how it performs.

From my side the 30b alpaca seems to be far more stable and have a higher chance to give good result. On the other hand vicuna is better at following instructions but occasionally give hallucination.

My parameter: max_new_tokens: 200 Prompt size in tokens: 2048

temperature: 0.95 repetition penalty: 1.1 top_p: 1 top_k: 40

I sometimes get good result from vicuna for tricky questions, but for my setup, it is not 1-shot, meaning I need to click regenerate a few times to make it good for logical deduction questions.

davidliudev commented 1 year ago

Some more tests using above parameters:

Alpaca 30b: test1

Vicuna 13b (this time 8 bit)

test2

mdawid commented 1 year ago

Here's what my local 4-bit model printed:

> If my Bluetooth earphone is broken, shall I see otologist or dentist?
If your Bluetooth earphone is broken, you should seek help from an electronic repair technician or a specialist who can fix or diagnose problems with electronic devices. An otolaryngologist, also known as an ear, nose, and throat (ENT) specialist, is a medical doctor who treats conditions related to the ear, nose, and throat, but they would not be able to help you with a broken Bluetooth earphone. A dentist, on the other hand, would not be able to help with a broken Bluetooth earphone either.

not as good and neat as 30B-alpaca, but there is still good logical reasoning.

and here are my parameters:

main -i --color --interactive-first -t 8 -b 256 --top_k 12 --top_p 1 --temp 0.36 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.05 --instruct -m models\vicuna\ggml-vicuna-13b-4bit-rev1.bin -r "### Human:" -f prompts\vicuna.txt

sampling: temp = 0.360000, top_k = 12, top_p = 1.000000, repeat_last_n = 64, repeat_penalty = 1.050000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 97

main: seed = 1681297316
llama.cpp: loading model from models\vicuna\ggml-vicuna-13b-4bit-rev1.bin
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 9807.47 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Human:'
mrsipan commented 1 year ago

Interesting, this is what I get with vicuna 13b 4b (llama.cpp):

> If I have a basket of potatoes and a box of cake on top of it, how do I get 1
 potato? 

To get one potato from the basket of potatoes with a box of cake on top,
 you can follow these steps:

 1. Remove the box of cake from the basket.
 2. Lift the basket and remove the
 potatoes from the basket, one at a time.
 3. If there are other items in the
 basket, you can either remove them first or carefully lift the basket while
 holding the potato, avoiding any other items that may be inside.
 4. Once you have removed all the potatoes from the basket, you can put the basket back in
 its original place or set it aside if you need to access it again.

 Alternatively, you can gently lift the box of cake and the basket together,
 carefully taking the potato out from underneath the cake without disturbing it.
 However, this may be more challenging depending on the size and weight of the
 box of cake and the basket.
mdawid commented 1 year ago

@mrsipan Very nice output. Could you share your llama.cpp parameters?

mrsipan commented 1 year ago

@mdawid This is it:

./main --color --threads 7 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.36 \
  --repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:" \
  --model models/13b/ggml-vicuna-13b-4bit-rev1.bin \
 -f prompts/vicuna.txt

where vicuna.txt contains:

A chat between a curious human and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the human's questions.
Thireus commented 1 year ago

@ZYHowell would you have any update for us about a potential 30B version? :)

ZYHowell commented 1 year ago

Hopefully we'll have a 30b-lora version by this week

sunshineyg2018 commented 1 year ago

希望这周我们能有一个 30b-lora 版本

cool

jeffwadsworth commented 1 year ago

@mdawid This is it:

./main --color --threads 7 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.36 \
  --repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:" \
  --model models/13b/ggml-vicuna-13b-4bit-rev1.bin \
 -f prompts/vicuna.txt

where vicuna.txt contains:

A chat between a curious human and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the human's questions.

Thanks for actually giving your full command line input. This is essential to getting similar results via testing.

CRD716 commented 1 year ago

Hopefully we'll have a 30b-lora version by this week

@ZYHowell any updates?

mirek190 commented 1 year ago

bump ;)

davidliudev commented 1 year ago

Hihi @ZYHowell do you have any updates so far?

musabgultekin commented 1 year ago

Hi everyone, I understand the excitement for the 30B version. While we eagerly await the update, let's try to be patient and supportive of the maintainers. Remember, they're working hard and don't have any obligation to meet specific deadlines. If you're able, consider offering help to the community instead of posting follow-up messages. Thanks for your understanding! 😊

EDIT: You don't have to wait this, Checkout: WizardLM-30B, OR GPT4-X-Alpasta-30b

mrsipan commented 1 year ago

@musabgultekin thank you for replying. Wonder if there is a way for us to follow the fine-tunning process, perhaps in a dev channel. This is so interesting. Great work on the 13B version!

Interpause commented 1 year ago

Could it be possible to benchmark Open Assistant's 30B LLaMA oasst finetune? (https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor). I have yet to run any quantitative stuff like perplexity, but it feels smarter. 4-bit quantized, it is able to run on 22GB of VRAM. In general, I think the newer Open Assistant models probably should be evaluated, they are on the 8th finetune of Pythia-12B now.

xiyuanyang45 commented 1 year ago

so is there any release plan of vicuna30b till now?

mirek190 commented 1 year ago

I'm also curious....

rsteubs-hst commented 1 year ago

bump

CRD716 commented 1 year ago

no reason to bump this every other day, let it rest, either they release it or something better will come along later anyways.

Boehmer86 commented 1 year ago

I would however also still like to give it a bump, for us it would be quite interesting to have all three versions up and running on our clusters for internal testing and 30B might really show as discussed better reasoning. Maybe we can give some donation to support this effort? :-)

zhisbug commented 1 year ago

we're working on it. Please stay tuned, LMSYS org always delivers the best model :)

Matthias-Johannes-Mack commented 1 year ago

The best way to pass the time is to have a look at Wizard 30B LLM. This also gives amazing results (can be found at hugginface).

Thireus commented 1 year ago

Alpasta 30B 🍝🤌

Pb-207 commented 1 year ago

Now we have Vicuna-30B now, and it works well even at GPTQ 4-bit, and can run on single 4090 with more than 10 tokens/s https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ/discussions

Pb-207 commented 1 year ago

Now we have Vicuna-30B now, and it works well even at GPTQ 4-bit, and can run on single 4090 with more than 10 tokens/s https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ/discussions Text generation web UI - Google Chrome 2023_6_3 21_51_2

zhisbug commented 1 year ago

see https://huggingface.co/lmsys/vicuna-33b-v1.3