Open deep-diver opened 1 year ago
Thanks. You have used the latest cleaned dataset?
Yeap
You seem to have experience with fine tuning the model in a different language.
I was only able to test the 7b bare model and it is not good in German and makes lot of grammar errors. I suspect it will be similar for Korean?
The 30b model was trained on bigger dataset is that better at Korean (and let's hope also German?)
How man does or can fine tuning with an other language dataset help if the base model is not good in that language.
What I want to find out if it is even worth trying to fine tune it with a German dataset?
Maybe a full fine-tuning instead of just Lora can help?
I have cheked that 30B model fine-tuned with the clean dataset hosted in this repo seems to have much better capability to answer in different languages. But, I have seen some cases showing good results when fine-tuned in their own languages.
Interesting. Auto translating using OpenAPI?
yea the one gpt-3.5-turbo
El dom, 19 de mar. de 2023 8:20 a. m., Chansung Park < @.***> escribió:
Yeap
2023년 3월 19일 (일) 오후 8:17, DanielWe2 @.***>님이 작성:
Thanks. You have used the latest cleaned dataset?
— Reply to this email directly, view it on GitHub <https://github.com/tloen/alpaca-lora/issues/68#issuecomment-1475215270 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGGR4N2FURW2RGOMMKFZMTTW43TNTANCNFSM6AAAAAAWABHQLY
. You are receiving this because you authored the thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/tloen/alpaca-lora/issues/68#issuecomment-1475216012, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLHFJGTIZMRGHZZ5CF6NFLW43TZXANCNFSM6AAAAAAWABHQLY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
with 30B model, I have experienced the following conversations:
continue
when the output is omitted.the problem is the inference speed with larger model. I am experimenting with different setups within GenerationConfig
(i.e. larger model seems work OK even if it has only a beam)
example output
I'm just working on including more samples in Spanish in the dataset for improving the performance in Spanish. Any thumb rule for the number of samples to include for having an effect in that sense?
Can you please share q4 version of alpaca_lora30b? or maybe ggml_alpaca30b_q4?
Tested the lora for 4bit model and it works. Just put those 2 files in peft path and GPTQ-For-LLaMa path for adaption https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/peft/tuners/lora.py https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/GPTQ-for-LLaMa/autograd_4bit.py
What devices are used in the inference phase 13B and 30B respectively, and what is the video memory usage? : )
@johnsmith0031 You mean for inference? Than you can also just use the export hf scripts from this repo and then quantize it with GPTQ for llama and for example then use textgeneration web ui in chat mode as an ui. The last works but not perfect because I doesn't use the prompt style that is used in alpaca Lora training.
@xieydd for GPTQ llama in 4bit it would be about 5 GB (7b), 8,4 GB (8 is not enough) for 13b and 20,5 for 30b.
@deep-diver Hi, could you share the way of generating data for fine-tuning koalpaca?
It would be more helpful if you can share some samples of your data.
Thank!
@DanielWe2 Thank you for the data.
I have put the links of both one in this repository : https://github.com/deep-diver/Alpaca-LoRA-Serve
I used A100 40GB to train both one. I didnt' change the script provided in this repository, just adjusted the batch size to max utilize the VRAM. Here is the report : https://wandb.ai/chansung18/huggingface/overview?workspace=user-chansung18
If anyone is interested in, try them
Are you using A100-40GB to train the 30B model? Wandb shows that A100-80GB is being used.
my bad
i am using too many vms, i was confused
@deep-diver
Do you happend with that runing llama-30 with lora occurs OOM when meet eval_steps
or model.save_pretrained(output_dir)
on the A100-80G.
The gpu-mem is from 53G to 80G+ rapidly and then cause OOM.
or what's your command?
I have found the problem, which due to the bitsandbytes
version. I downgrade it from 0.38.1
to 0.37.0
. Then it works fine and won't cause OOM.
I have put the links of both one in this repository : https://github.com/deep-diver/Alpaca-LoRA-Serve
I used A100 40GB to train both one. I didnt' change the script provided in this repository, just adjusted the batch size to max utilize the VRAM. Here is the report : https://wandb.ai/chansung18/huggingface/overview?workspace=user-chansung18
If anyone is interested in, try them