Closed Fusseldieb closed 1 year ago
Use a transformers model with load_in_4bit
instead. LoRA training with GPTQ-for-LLaMa is implemented but outdated, and AutoGPTQ integration is not implemented yet
@oobabooga Is it possible to apply a Lora generated with transformers to a GGML model? I'm getting a similar error when trying to do so. I can apply it to GPTQ though. Thanks.
Use a transformers model with
load_in_4bit
instead. LoRA training with GPTQ-for-LLaMa is implemented but outdated, and AutoGPTQ integration is not implemented yet
What would a transformers model be, a GGML one?
From what I've seen, at least on my PC, GGML models are much MUCH slower than their GPTQ counterparts (GGML: 0.5t/sec, GPTQ: 35t/sec), which is what made me go "all-in" on GPTQ models. Or is there a way to make GGML model as fast as GPTQ ones?
Cheers!
No, GGML is something else. Transformers models look like this:
https://huggingface.co/lmsys/vicuna-33b-v1.3/tree/main
They have files named pytorch_model-00001-of-00007.bin
or similar, and they do not have 4bit, GGML, or GPTQ in their names.
Is it possible to apply a Lora generated with transformers to a GGML model? I'm getting a similar error when trying to do so. I can apply it to GPTQ though. Thanks.
Be me, want to use 70b ggml loras. Convert the adapters to ggml format. They double in size. Write code to use them through the ui. The moment is finally here. The lora is loading.
"offloading of layers to GPU with lora is only supported for the F16 model type"
Use a transformers model with
load_in_4bit
instead. LoRA training with GPTQ-for-LLaMa is implemented but outdated, and AutoGPTQ integration is not implemented yet
@oobabooga There's a bug that had me confused. If monkey_patch is on and you try to train using a Transformers model you get this message on console ModuleNotFoundError: No module named 'monkeypatch'
perhaps a check will prevent this? Thanks again
@hypersniper05 If you read here there's more to it than just monkeypatch:
You also need to install alpaca_lora_4bit in repositories and install a patched version of GPTQ-For-Llama from @sterlind
However, when I do all that, I still get an error, so either it's broken or there's something else I'm missing.
The monkey patch was created for GPTQ-for-LLaMa ages ago. AutoGPTQ training integration is not implemented yet (if someone wants to do it and submit a PR, that would be helpful); once implemented, it will not require the monkey patch.
For now, I recommend downloading a 16-bit transformers model and training it with --load-in-4bit
instead of using a GPTQ model.
@hypersniper05 If you read here there's more to it than just monkeypatch:
You also need to install alpaca_lora_4bit in repositories and install a patched version of GPTQ-For-Llama from @sterlind
However, when I do all that, I still get an error, so either it's broken or there's something else I'm missing.
@tqman Thanks , I have it installed aready from source, it's just the new update broke it. I am training 16-bit models like @oobabooga stated but we are limited in how deep we can train the model because it does require more VRAM than GPTQ models.
Use the repo directly.https://github.com/johnsmith0031/alpaca_lora_4bit/tree/winglian-setup_pip
Finetune.py is back and simple to set up. Unless you are using a pascal GPU it is much better and quicker.
Really the load-in-4bit uses a similar amount of vram but it is slower. There is also https://github.com/OpenAccess-AI-Collective/axolotl
@oobabooga Looks like AutoGPTQ is now part of Transformers https://huggingface.co/blog/gptq-integration
If they did good then the code can just work as is, same as for BnB
Okay, that's quite impressive: it just works. Just load a GPTQ model with --loader transformers --auto-devices
, or in the UI by selecting the "Transformers" loader and checking this checkbox:
I did that and managed to train a tiny LoRA without errors and without changing anything else. The important bit was to activate "auto-devices", otherwise the model doesn't load.
There you go, GPTQ LoRA training is now implemented lol. That saves me so much trouble.
I tried just now and I get this message
ImportError: Loading GPTQ quantized model requires optimum library : pip install optimum and auto-gptq library ‘pip install auto-gptq’
I am going to install optium but it does specify this in the blog https://huggingface.co/blog/gptq-integration#native-support-of-gptq-models-in-%F0%9F%A4%97-transformers
@oobabooga There a note about exllama kernels when fine tunning - "Note that only 4-bit models are supported for now. Furthermore, it is recommended to deactivate the exllama kernels if you are finetuning a quantized model with peft."
Maybe a option to disable it like AutoGPTQ but in Transformers loader ui?
Strangely though, it works for some models but not others. I tried Karen_theEditor-13B-4bit-128g-GPTQ by @FartyPants and got this error:
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/FPHam_Karen_theEditor-13B-4bit-128g-GPTQ.
This was the command:
python server.py --model models/FPHam_Karen_theEditor-13B-4bit-128g-GPTQ/ --auto-devices --loader transformers --listen
For Llama-2-7b-Chat-GPTQ it works:
python server.py --model models/TheBloke_Llama-2-7b-Chat-GPTQ --auto-devices --loader transformers --listen
@hypersniper05 yes, optimum is required. I have added it to the requirements: https://github.com/oobabooga/text-generation-webui/commit/0576691538da4f93d50b9ee28c252eb91c4f0da3
Strangely though, it works for some models but not others. I tried Karen_theEditor-13B-4bit-128g-GPTQ by @FartyPants and got this error:
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/FPHam_Karen_theEditor-13B-4bit-128g-GPTQ.
This was the command:
python server.py --model models/FPHam_Karen_theEditor-13B-4bit-128g-GPTQ/ --auto-devices --loader transformers --listen
For Llama-2-7b-Chat-GPTQ it works:
python server.py --model models/TheBloke_Llama-2-7b-Chat-GPTQ --auto-devices --loader transformers --listen
@hypersniper05 yes, optimum is required. I have added it to the requirements: 0576691
@oobabooga That because the quantization_config.json
has to be present in the model folder see link : https://huggingface.co/docs/transformers/main/en/main_classes/quantization#load-a-quantized-model-from-the-hub
Here is the files for llama 2 7b link you posted above and it has it but the Karen does not
Karen does have a quantize_config.json though: https://huggingface.co/FPHam/Karen_theEditor-13B-4bit-128g-GPTQ/blob/main/quantize_config.json
The one in TheBloke: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ/blob/main/quantize_config.json
There were two missing keys, namely:
"model_name_or_path": null,
"model_file_base_name": "model"
I have tried manually adding them and it didn't work.
@oobabooga Ill look into it. I added a pull request that adds the disable_exllama to the GPTQConfig arg #3772
I am going to help you on the training side. Are you open to having a Tools tab? I was thinking oh adding the feature to quantize 16bit models using the transformer library. Maybe also merge lora weights and a feature to push to hugginface
Edit: By the way I am training now using a GPTQ model and works great TheBloke/WizardCoder-Python-34B-V1.0-GPTQ . Lets see what happens when I add the lora
Doesn't seem to be working for me. I get the following error after about 15 steps of training:
File "T:\code\python\oobabooga\installer_files\env\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 406, in update
found_inf_combined = found_infs[0]
IndexError: list index out of range
Doesn't seem to be working for me. I get the following error after about 15 steps of training:
File "T:\code\python\oobabooga\installer_files\env\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 406, in update found_inf_combined = found_infs[0] IndexError: list index out of range
Same here, it happens as soon as the console prints the loss. I'll take a look tonight and see if I can fix it
Error
2023-08-31 16:39:05 INFO:Loading TheBloke_WizardCoder-Python-34B-V1.0-GPTQ...
2023-08-31 16:39:52 WARNING:models\TheBloke_WizardCoder-Python-34B-V1.0-GPTQ\tokenizer_config.json is different from the original LlamaTokenizer file. It is either customized or outdated.
2023-08-31 16:39:52 WARNING:models\TheBloke_WizardCoder-Python-34B-V1.0-GPTQ\special_tokens_map.json is different from the original LlamaTokenizer file. It is either customized or outdated.
2023-08-31 16:39:52 INFO:Loaded the model in 47.04 seconds.
Model reloaded OK, continue with training.
2023-08-31 16:39:52 INFO:Getting model ready...
2023-08-31 16:39:52 INFO:Preparing for training...
2023-08-31 16:39:52 INFO:Creating LoRA model...
2023-08-31 16:39:52 INFO:Starting training...
Training 'llama' model using (q, v) projections
Trainable params: 39,321,600 (6.9667 %), All params: 564,420,608 (Model: 525,099,008)
2023-08-31 16:39:52 INFO:Log file 'train_dataset_sample.json' created in the 'logs' directory.
Exception in thread Thread-19 (threaded_run):
Traceback (most recent call last):
File "D:\AI\oobabooga_windows\installer_files\env\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "D:\AI\oobabooga_windows\installer_files\env\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\AI\oobabooga_windows\text-generation-webui\modules\training.py", line 655, in threaded_run
trainer.train()
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\trainer.py", line 1555, in train
return inner_training_loop(
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\trainer.py", line 1916, in _inner_training_loop
self.optimizer.step()
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\optimizer.py", line 132, in step
self.scaler.step(self.optimizer, closure)
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 372, in step
assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.
2023-08-31 16:48:17 INFO:Training complete, saving...
2023-08-31 16:48:17 INFO:Training complete!
Well.. windows :(
@hypersniper05 Any luck on resolving that? I'm also consistently getting the "AssertionError: No inf checks were recorded for this optimizer" issue, even when using https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ as linked above by @oobabooga
It continued to work again when I upgraded the version!
pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118
you need torch-2.2.0.dev20230916+cu118-cp310-cp310-win_amd64.whl
Use the latest version of
@tqman @Fusseldieb @hypersniper05 @Ph0rk0z @Vinno97
Has anyone successfully trained TheBloke/falcon-40b-instruct-GPTQ?
ERROR:Failed to load the model.
. But any attempt to train ends up with an error similar to the original issue.ValueError: Target modules [‘q_proj’, ‘v_proj’] not found in the base model. Please check the target modules and try again.
ERROR:Failed to load the model.
error. However, inference no longer works as it did when loaded via AutoGPTQ. I get the following error:
File "/home/mrroll/.cache/huggingface/modules/transformers_modules/TheBloke_falcon-40B-instruct-GPTQ/modelling_RW.py", line 290, in forward
attn_output = F.scaled_dot_product_attention(
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
In addition, attempting to train yields the same error as if loaded with AutGPTQ
ValueError: Target modules [‘q_proj’, ‘v_proj’] not found in the base model. Please check the target modules and try again.
I have added the disable_exllama
option for Transformers here: https://github.com/oobabooga/text-generation-webui/commit/36c38d756137ee335b37430c4a14eada1e7ece2e
LoRA training with GPTQ models should work now. Make sure to load the model using the Transformers loader with both auto-devices
and disable_exllama
checked.
I have added the
disable_exllama
option for Transformers here: 36c38d7LoRA training with GPTQ models should work now. Make sure to load the model using the Transformers loader with both
auto-devices
anddisable_exllama
checked.
Thanks for this! When trying to do inference on the most up-to-date version of the repository, I am still getting this error:
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
I'm usingTheBloke/falcon-40b-instruct-GPTQ, specifically.
Any chance you or anyone else has ideas on how to get this going?
I have added the
disable_exllama
option for Transformers here: 36c38d7LoRA training with GPTQ models should work now. Make sure to load the model using the Transformers loader with both
auto-devices
anddisable_exllama
checked.
I got LoRA training start for the first time by using what onexixi linked. After updating and doing ^, the lora training now returns the following shortened error:
Traceback (most recent call last):
File “E:\one-click-installers-main\text-generation-webui\modules\[training.py](http://training.py/)”, line 510, in do_train
set_peft_model_state_dict(lora_model, state_dict_peft)
File “E:\one-click-installers-main\installer_files\env\lib\site-packages\peft\utils\save_and_load.py”, line 135, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File “E:\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\[module.py](http://module.py/)”, line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
…
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([4096, 64]).
It continued to work again when I upgraded the version!
pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118
you need torch-2.2.0.dev20230916+cu118-cp310-cp310-win_amd64.whl Use the latest version of
This might have worked for me. I was getting the lora to begin training, but I don't know if it ever fully completed. I ran into an error that "completed" the lora every time it tried to run an eval. None of the loras seem to do anything though. I tried one other LoRA-specific UI from github, and that never worked.
Should I make a new issue for this? Anyone else have these issues/know any fixes?
Edit: After doing onexixi's suggestion again, the LoRA training now begins again after updating and with both exllama kernel disabled and auto devices turned on. I don't know if it's going to actually complete or not. I also noticed the 4096 is the value I had set for the max tokens or context length with a SuperHot model since the full 8k didn't fit on my gpu. I did not have the context changed for the model I was trying to train (the model ooba suggested above) and I don't even see the options when loading with transformers so I don't know how or why that is there. 64 was the value I had set to the LoRA Rank for that attempt. These numbers could be unrelated, but it seems like more than a coincidence. I still have issues and idk if any of the issues I mentioned have been solved as of this edit 29-9.
@VegaStarlake your lora rank is too high
@hypersniper05 Could you explain or provide a source? I haven't seen any documentation about limits on the lora rank other than the mention that it can increase vram usage but my error doesn't seem to be a memory issue (I could be wrong though). But more importantly, when I force re-installed cuda the training started with all the training settings exactly the same as they were in the first error. i.e. doing what onexixi linked "fixed" that error with no changes to LoRA Rank.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
I am training with theBloke llama2 7b, using the following model load configuration on the training tab
my equipment has these characteristics:
.m2 samsung evo 2tb i5 12400 Nvidia 4060Ti 16gb 16gb Ram DDR4
all the other model loaders I have tried have failed with the same error as in the main thread, also I have only succeeded with GPTQ, with GUFF I have not been able to, does anyone know if it is even possible?
GGUF has no support for training in this UI. You can do it through native llama.cpp
Describe the bug
Whenever I try to train a LoRA model, I get following error:
I've tried ExLlama_HF and GPTQ-for-Llama. Neither do work and both show the exact same error.
I've tried with this model: "https://huggingface.co/TheBloke/Llama-2-7B-32K-Instruct-GPTQ"
Stale issues that were closed automatically:
Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info