Open GreenTeaBD opened 1 year ago
Can you show the code you are using to load the fine tuned model? As far as I understand, the model file generated by the finetune script will be an adapter file that you have to merge with the main model for inference. Although, I have also noted poor performance after fine tuning.
I'm using https://github.com/lxe/simple-llama-finetuner which has a simple web-ui for inference with the adapter file That has it's own way to finetune but here I'm not using that, just using it for inference.
the UI for finetuning there though, if I just dump text in there and create a LoRA in that it behaves how I'd expect it to behave, takes about 30 minutes to finetune but then using the adapter file I get from it I put in daosays: and I get some very daoist/stoic responses.
But finetuned here it's, like I was saying, just very quick nothing. I put in daosays: and I get nothing related to anything I (supposedly) finetuned it on.
All hyper params have to match your dataset size.
The default params, including warmup, are tuned for large datasets (ex, warmup 100 steps) If you use a dataset of a different enough size, you have several things to adjust.
Also, with so little data, you'll need much more epochs to "move" the model somewhere. There is just not enough data with regard to the LoRA weights to learn...
All hyper params have to match your dataset size.
The default params, including warmup, are tuned for large datasets (ex, warmup 100 steps) If you use a dataset of a different enough size, you have several things to adjust.
Is there any documentation for them anywhere? For some they don't have an equivalent with normal finetuning parameters so I'm not really sure what they should be set to.
I'm thinking maybe this is a problem with my dataset, but I'm not exactly sure what could be wrong. I've attached my training data daostoaformatted.zip The training data is the daoist classics and the stoic classics mixed. The idea is to get it to output stuff like them when the user inputs the trigger "daosays:"
WSL2 Ubuntu on Windows 11, cuda 11.7 output of nvcc -V is: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0
Hardware: 5900x, 32GB ram, 4090
I finetune with the following:
python3 finetune.py \ --base_model 'decapoda-research/llama-7b-hf' \ --data_path './daostoaformatted.json' \ --output_dir './test' \ --batch_size 128 \ --micro_batch_size 4 \ --num_epochs 4 \ --learning_rate 1e-4 \ --cutoff_len 512 \ --val_set_size 530 \ --lora_r 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --lora_target_modules '[q_proj,v_proj]' \ --train_on_inputs \ --group_by_length
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
/home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/ckg/anaconda3/envs/alpacalora/lib')} warn(msg) /home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ckg/anaconda3/envs/alpacalora did not contain libcudart.so as expected! Searching further paths... warn(msg) CUDA exception! Error code: no CUDA-capable device is detected CUDA exception! Error code: initialization error CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so /home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... Training Alpaca-LoRA model with params: base_model: decapoda-research/llama-7b-hf data_path: ./daostoaformatted.json output_dir: ./test batch_size: 128 micro_batch_size: 4 num_epochs: 4 learning_rate: 0.0001 cutoff_len: 512 val_set_size: 530 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True group_by_length: True wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: alpaca
Loading checkpoint shards: 100%|████████████████████████████████████████████| 33/33 [00:35<00:00, 1.09s/it] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Downloading and preparing dataset json/default to /home/ckg/.cache/huggingface/datasets/json/default-834b74985ca18b59/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100%|██████████████████████████████████████████████| 1/1 [00:00<00:00, 11008.67it/s] Extracting data files: 100%|████████████████████████████████████████████████| 1/1 [00:00<00:00, 1048.05it/s] Dataset json downloaded and prepared to /home/ckg/.cache/huggingface/datasets/json/default-834b74985ca18b59/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 388.69it/s] trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 {'train_runtime': 5.2858, 'train_samples_per_second': 3.027, 'train_steps_per_second': 0.757, 'train_loss': 0.07745875418186188, 'epoch': 4.0} 100%|█████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.32s/it]
If there's a warning about missing keys above, please disregard :)
bitsandbytes complaining about the cpu-only library is weird but it seems to work. I did replace libbitsandbytes_cpu.so with libbitsandbytes_cuda117.so to fix another problem so that might be why it says it's going cpu only and then seems to work anyway.
But anyway, the problem is, it does its 30 second finetuning. Then I get a 16MB file, I use it and in inference it doesn't behave how it's supposed to at all (at least how I think I should.) I suspect it's not actually finetuning it at all, other methods of making a LoRA have worked and taken about 30 minutes to finetune. Somethings very wrong with the way I'm doing it but I have no idea what.
Any ideas?