Open Oxi84 opened 1 year ago
Is your dataset format the same to the official one?
Yes, i tried a part of the official dataset. And then tried another one.
One strange thing is that huggingface trainer does not show any loss, and it should show, i guess?
Anyways I will try on another hardware, i was using runpod, will try on vast.ai. Maybe hardware related., even no hardware error was there.
maybe even google colab can be used
Just use colab to train 10 instances 7B model,set bs=2,mbs=1,log step=1, val step=10 warmup=10, 40 epochs,Val dataset=train_set, at least the model can answer or only answer your 10 instances.
I tried colab, it does not have enough memory(just 16GB). Also i tried vast.ai and it work the same as on previous, no change with peft. Where do you train your models, if not a secret?
I will not try on A100 with 80 GB wihout any PEFT.
I tried colab, it does not have enough memory(just 16GB). Also i tried vast.ai and it work the same as on previous, no change with peft. Where do you train your models, if not a secret?
I will not try on A100 with 80 GB wihout any PEFT.
I just tried Colab 16G, for 7B model it only took up to 8.7G VRAM, but you need to use bitsandbytes ==0.37.2
{'eval_loss': 0.0012775100767612457, 'eval_runtime': 0.2768, 'eval_samples_per_second': 3.613, 'eval_steps_per_second': 3.613, 'epoch': 40.0}
{'train_runtime': 447.425, 'train_samples_per_second': 0.089, 'train_steps_per_second': 0.089, 'train_loss': 0.5826013974583475, 'epoch': 40.0}
100%|█████████████████████████████████████████████| 40/40 [01:20<00:00, 2.01s/it]
Awesome, on the other hands I wasnt able to even train on 4 A100 with total 160 GB :)
OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 1; 39.39 GiB total capacity; 37.92 GiB already allocated; 25.81 MiB free; 38.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I even put a map for all to be less than 40 GB, but one failed.
Awesome, on the other hands I wasnt able to even train on 4 A100 with total 160 GB :)
OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 1; 39.39 GiB total capacity; 37.92 GiB already allocated; 25.81 MiB free; 38.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I even put a map for all to be less than 40 GB, but one failed.
Don't waste money before you pass the test version, even for multi-gpus, you can rent 4x1080 from vast.ai to test it first.
Yes, this would be smarter. 4 smaller card to debug first.
So you used this exact file on colab? https://github.com/tloen/alpaca-lora/blob/main/finetune.py
Yes, this would be smarter. 4 smaller card to debug first.
So you used this exact file on colab? https://github.com/tloen/alpaca-lora/blob/main/finetune.py
Change the Hyperparameters for you test data, and use the last version of peft. read other people's issues.
I will check other people issues, but hyperparameters wont help, adapter has zero effect on the model in this case. I will rerun in command line like bellow.
I did this on google colab (and other platforms as well):
!pip install transformers
!pip install fire
!pip install peft
!pip install gradio
!pip install SentencePiece
!pip install datasets
!pip install accelerate
!pip install bitsandbytes
Then I did: !git clone https://github.com/tloen/alpaca-lora
and I run with something like this: !python /content/alpaca-lora/finetune.py --base_model='huggyllama/llama-7b' --data_path 'dataset.json' --output_dir '/content/lora-alpaca'
I just need to change some patches in the files.
I will check other people issues, but hyperparameters wont help, adapter has zero effect on the model in this case. I will rerun in command line like bellow.
I did this on google colab (and other platforms as well):
!pip install transformers !pip install fire !pip install peft !pip install gradio !pip install SentencePiece !pip install datasets !pip install accelerate !pip install bitsandbytes
Then I did: !git clone https://github.com/tloen/alpaca-lora
and I run with something like this: !python /content/alpaca-lora/finetune.py --base_model='huggyllama/llama-7b' --data_path 'dataset.json' --output_dir '/content/lora-alpaca'
I just need to change some patches in the files.
I will add a 100% working version for training model to answer basic question with 1 min training time today.
Thanks for the help.
This loss is bad? ###################################################################################### trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 {'train_runtime': 153.0811, 'train_samples_per_second': 5.487, 'train_steps_per_second': 0.039, 'train_loss': 2.6163388888041177, 'epoch': 2.74} 100%|█████████████████████████████████████████████| 6/6 [02:33<00:00, 25.51s/it] ###############################################################################################
So far I have only been training T5 and with T5 large around 200 examples are enough, get loss down to 0.06.
I managed it to work, seem like the problem was pretty small learning rate. Now it performs quite good. So loss is much lower at the end.
@Oxi84 Hi, so what is your learning rate? I use 1e-5 and meet a similar problem that the output of the tuned model is exactly the same as the original untuned model.
I found that after training, in q_proj and v_proj, All weights of lora_B are 0. So that the weights of the model before and after training are exactly the same, os the output is also the same. But I don't know why this is happening.
@Casi11as hi, have you solved the problem? I met the same one.
@Casi11as hi, have you solved the problem? I met the same one.
no, I don't solve it. I tried using other base models (starcoder), but it is the same.
@Casi11as @chloefresh running the the training with default parameters of accelerate solved this problem for me
I trained the PEFT model on my dataset, I used file finetune.py.
There is no difference between using and not using PEFT in the interface, so training with finetune.py does not make work.
I am very surprised because I used everything the same as other people here and I checked input manually and it looks normal, just PEFT doe not load and there is no difference.
Is there an error in the script now?