tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.52k stars 2.21k forks source link

This does not work, training adapter and using it does not change output of the model. #357

Open Oxi84 opened 1 year ago

Oxi84 commented 1 year ago

I trained the PEFT model on my dataset, I used file finetune.py.

There is no difference between using and not using PEFT in the interface, so training with finetune.py does not make work.

I am very surprised because I used everything the same as other people here and I checked input manually and it looks normal, just PEFT doe not load and there is no difference.

Is there an error in the script now?

lywinged commented 1 year ago

Is your dataset format the same to the official one?

Oxi84 commented 1 year ago

Yes, i tried a part of the official dataset. And then tried another one.

One strange thing is that huggingface trainer does not show any loss, and it should show, i guess?

Anyways I will try on another hardware, i was using runpod, will try on vast.ai. Maybe hardware related., even no hardware error was there.

Oxi84 commented 1 year ago

maybe even google colab can be used

lywinged commented 1 year ago

Just use colab to train 10 instances 7B model,set bs=2,mbs=1,log step=1, val step=10 warmup=10, 40 epochs,Val dataset=train_set, at least the model can answer or only answer your 10 instances.

Oxi84 commented 1 year ago

I tried colab, it does not have enough memory(just 16GB). Also i tried vast.ai and it work the same as on previous, no change with peft. Where do you train your models, if not a secret?

I will not try on A100 with 80 GB wihout any PEFT.

lywinged commented 1 year ago

I tried colab, it does not have enough memory(just 16GB). Also i tried vast.ai and it work the same as on previous, no change with peft. Where do you train your models, if not a secret?

I will not try on A100 with 80 GB wihout any PEFT.

I just tried Colab 16G, for 7B model it only took up to 8.7G VRAM, but you need to use bitsandbytes ==0.37.2

{'eval_loss': 0.0012775100767612457, 'eval_runtime': 0.2768, 'eval_samples_per_second': 3.613, 'eval_steps_per_second': 3.613, 'epoch': 40.0}
{'train_runtime': 447.425, 'train_samples_per_second': 0.089, 'train_steps_per_second': 0.089, 'train_loss': 0.5826013974583475, 'epoch': 40.0} 100%|█████████████████████████████████████████████| 40/40 [01:20<00:00, 2.01s/it]

Oxi84 commented 1 year ago

Awesome, on the other hands I wasnt able to even train on 4 A100 with total 160 GB :)

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 1; 39.39 GiB total capacity; 37.92 GiB already allocated; 25.81 MiB free; 38.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I even put a map for all to be less than 40 GB, but one failed.

lywinged commented 1 year ago

Awesome, on the other hands I wasnt able to even train on 4 A100 with total 160 GB :)

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 1; 39.39 GiB total capacity; 37.92 GiB already allocated; 25.81 MiB free; 38.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I even put a map for all to be less than 40 GB, but one failed.

Don't waste money before you pass the test version, even for multi-gpus, you can rent 4x1080 from vast.ai to test it first.

Oxi84 commented 1 year ago

Yes, this would be smarter. 4 smaller card to debug first.

So you used this exact file on colab? https://github.com/tloen/alpaca-lora/blob/main/finetune.py

lywinged commented 1 year ago

Yes, this would be smarter. 4 smaller card to debug first.

So you used this exact file on colab? https://github.com/tloen/alpaca-lora/blob/main/finetune.py

Change the Hyperparameters for you test data, and use the last version of peft. read other people's issues.

Oxi84 commented 1 year ago

I will check other people issues, but hyperparameters wont help, adapter has zero effect on the model in this case. I will rerun in command line like bellow.

I did this on google colab (and other platforms as well):

 !pip install transformers
 !pip install fire
 !pip install peft
 !pip install gradio
 !pip install SentencePiece
 !pip install datasets
 !pip install accelerate
 !pip install bitsandbytes

Then I did: !git clone https://github.com/tloen/alpaca-lora

and I run with something like this: !python /content/alpaca-lora/finetune.py --base_model='huggyllama/llama-7b' --data_path 'dataset.json' --output_dir '/content/lora-alpaca'

I just need to change some patches in the files.

lywinged commented 1 year ago

I will check other people issues, but hyperparameters wont help, adapter has zero effect on the model in this case. I will rerun in command line like bellow.

I did this on google colab (and other platforms as well):

 !pip install transformers
 !pip install fire
 !pip install peft
 !pip install gradio
 !pip install SentencePiece
 !pip install datasets
 !pip install accelerate
 !pip install bitsandbytes

Then I did: !git clone https://github.com/tloen/alpaca-lora

and I run with something like this: !python /content/alpaca-lora/finetune.py --base_model='huggyllama/llama-7b' --data_path 'dataset.json' --output_dir '/content/lora-alpaca'

I just need to change some patches in the files.

I will add a 100% working version for training model to answer basic question with 1 min training time today.

Oxi84 commented 1 year ago

Thanks for the help.

This loss is bad? ###################################################################################### trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 {'train_runtime': 153.0811, 'train_samples_per_second': 5.487, 'train_steps_per_second': 0.039, 'train_loss': 2.6163388888041177, 'epoch': 2.74} 100%|█████████████████████████████████████████████| 6/6 [02:33<00:00, 25.51s/it] ###############################################################################################

So far I have only been training T5 and with T5 large around 200 examples are enough, get loss down to 0.06.

Oxi84 commented 1 year ago

I managed it to work, seem like the problem was pretty small learning rate. Now it performs quite good. So loss is much lower at the end.

TingchenFu commented 1 year ago

@Oxi84 Hi, so what is your learning rate? I use 1e-5 and meet a similar problem that the output of the tuned model is exactly the same as the original untuned model.

Casi11as commented 1 year ago

I found that after training, in q_proj and v_proj, All weights of lora_B are 0. So that the weights of the model before and after training are exactly the same, os the output is also the same. But I don't know why this is happening.

temp

chloefresh commented 1 year ago

@Casi11as hi, have you solved the problem? I met the same one.

Casi11as commented 1 year ago

@Casi11as hi, have you solved the problem? I met the same one.

no, I don't solve it. I tried using other base models (starcoder), but it is the same.

Rinatum commented 7 months ago

@Casi11as @chloefresh running the the training with default parameters of accelerate solved this problem for me