tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.62k stars 2.22k forks source link

sharing 13B and 30B Alpaca-LoRA ckpt #68

Open deep-diver opened 1 year ago

deep-diver commented 1 year ago

I have put the links of both one in this repository : https://github.com/deep-diver/Alpaca-LoRA-Serve

I used A100 40GB to train both one. I didnt' change the script provided in this repository, just adjusted the batch size to max utilize the VRAM. Here is the report : https://wandb.ai/chansung18/huggingface/overview?workspace=user-chansung18

If anyone is interested in, try them

DanielWe2 commented 1 year ago

Thanks. You have used the latest cleaned dataset?

deep-diver commented 1 year ago

Yeap

DanielWe2 commented 1 year ago

You seem to have experience with fine tuning the model in a different language.

I was only able to test the 7b bare model and it is not good in German and makes lot of grammar errors. I suspect it will be similar for Korean?

The 30b model was trained on bigger dataset is that better at Korean (and let's hope also German?)

How man does or can fine tuning with an other language dataset help if the base model is not good in that language.

What I want to find out if it is even worth trying to fine tune it with a German dataset?

Maybe a full fine-tuning instead of just Lora can help?

deep-diver commented 1 year ago

I have cheked that 30B model fine-tuned with the clean dataset hosted in this repo seems to have much better capability to answer in different languages. But, I have seen some cases showing good results when fine-tuned in their own languages.

DanielWe2 commented 1 year ago

Interesting. Auto translating using OpenAPI?

deep-diver commented 1 year ago

yea the one gpt-3.5-turbo

gonzalogcaminos2010 commented 1 year ago

El dom, 19 de mar. de 2023 8:20 a. m., Chansung Park < @.***> escribió:

Yeap

2023년 3월 19일 (일) 오후 8:17, DanielWe2 @.***>님이 작성:

Thanks. You have used the latest cleaned dataset?

— Reply to this email directly, view it on GitHub <https://github.com/tloen/alpaca-lora/issues/68#issuecomment-1475215270 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGGR4N2FURW2RGOMMKFZMTTW43TNTANCNFSM6AAAAAAWABHQLY

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/tloen/alpaca-lora/issues/68#issuecomment-1475216012, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLHFJGTIZMRGHZZ5CF6NFLW43TZXANCNFSM6AAAAAAWABHQLY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

deep-diver commented 1 year ago

with 30B model, I have experienced the following conversations:

  1. continue when the output is omitted.
  2. code refactoring
  3. reformatting text into markdown format (just simple list-up to bullet pointers)
  4. understanding languages other than English (somewhat limited tho)
  5. sort of getting advice (planning x-days travel, relationship between husband and wife) ...

the problem is the inference speed with larger model. I am experimenting with different setups within GenerationConfig (i.e. larger model seems work OK even if it has only a beam)


example output

josemlopez commented 1 year ago

I'm just working on including more samples in Spanish in the dataset for improving the performance in Spanish. Any thumb rule for the number of samples to include for having an effect in that sense?

DaveScream commented 1 year ago

Can you please share q4 version of alpaca_lora30b? or maybe ggml_alpaca30b_q4?

johnsmith0031 commented 1 year ago

Tested the lora for 4bit model and it works. Just put those 2 files in peft path and GPTQ-For-LLaMa path for adaption https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/peft/tuners/lora.py https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/GPTQ-for-LLaMa/autograd_4bit.py

xieydd commented 1 year ago

What devices are used in the inference phase 13B and 30B respectively, and what is the video memory usage? : )

DanielWe2 commented 1 year ago

@johnsmith0031 You mean for inference? Than you can also just use the export hf scripts from this repo and then quantize it with GPTQ for llama and for example then use textgeneration web ui in chat mode as an ui. The last works but not perfect because I doesn't use the prompt style that is used in alpaca Lora training.

DanielWe2 commented 1 year ago

@xieydd for GPTQ llama in 4bit it would be about 5 GB (7b), 8,4 GB (8 is not enough) for 13b and 20,5 for 30b.

UranusSeven commented 1 year ago

@deep-diver Hi, could you share the way of generating data for fine-tuning koalpaca?

It would be more helpful if you can share some samples of your data.

Thank!

xieydd commented 1 year ago

@DanielWe2 Thank you for the data.

lurenlym commented 1 year ago

I have put the links of both one in this repository : https://github.com/deep-diver/Alpaca-LoRA-Serve

I used A100 40GB to train both one. I didnt' change the script provided in this repository, just adjusted the batch size to max utilize the VRAM. Here is the report : https://wandb.ai/chansung18/huggingface/overview?workspace=user-chansung18

If anyone is interested in, try them

Are you using A100-40GB to train the 30B model? Wandb shows that A100-80GB is being used.

deep-diver commented 1 year ago

my bad

i am using too many vms, i was confused

better629 commented 1 year ago

@deep-diver Do you happend with that runing llama-30 with lora occurs OOM when meet eval_steps or model.save_pretrained(output_dir) on the A100-80G. The gpu-mem is from 53G to 80G+ rapidly and then cause OOM.

or what's your command?

better629 commented 1 year ago

I have found the problem, which due to the bitsandbytes version. I downgrade it from 0.38.1 to 0.37.0. Then it works fine and won't cause OOM.