oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.78k stars 5.33k forks source link

error training #6093

Open pro9code opened 5 months ago

pro9code commented 5 months ago

Describe the bug

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: struct c10::Half != float

Is there an existing issue for this?

Reproduction

load tinydolphin in 8 bit, try to make lora

Screenshot

No response

Logs

Traceback (most recent call last):
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\utils\hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
                    ^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'models\None'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\AI\text-generation-webui-main\modules\training.py", line 508, in do_train
    reload_model()
  File "D:\AI\text-generation-webui-main\modules\models.py", line 439, in reload_model
    shared.model, shared.tokenizer = load_model(shared.model_name)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\modules\models.py", line 94, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\modules\models.py", line 149, in huggingface_loader
    config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 928, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\configuration_utils.py", line 631, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\configuration_utils.py", line 686, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\utils\hub.py", line 462, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'models\None'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

18:39:01-714429 INFO     Loading "cognitivecomputations_TinyDolphin-2.8-1.1b"
18:39:01-718433 INFO     TRANSFORMERS_PARAMS=
{   'low_cpu_mem_usage': True,
    'torch_dtype': torch.float16,
    'device_map': 'auto',
    'quantization_config': BitsAndBytesConfig {
  "_load_in_4bit": false,
  "_load_in_8bit": true,
  "bnb_4bit_compute_dtype": "float32",
  "bnb_4bit_quant_storage": "uint8",
  "bnb_4bit_quant_type": "fp4",
  "bnb_4bit_use_double_quant": false,
  "llm_int8_enable_fp32_cpu_offload": false,
  "llm_int8_has_fp16_weight": false,
  "llm_int8_skip_modules": null,
  "llm_int8_threshold": 6.0,
  "load_in_4bit": false,
  "load_in_8bit": true,
  "quant_method": "bitsandbytes"
}
}

18:39:04-583136 INFO     Loaded "cognitivecomputations_TinyDolphin-2.8-1.1b" in 2.87 seconds.
18:39:04-585136 INFO     LOADER: "Transformers"
18:39:04-586135 INFO     TRUNCATION LENGTH: 4096
18:39:04-586642 INFO     INSTRUCTION TEMPLATE: "Alpaca"
18:39:05-418538 INFO     Loading raw text file dataset
18:40:16-754125 INFO     Getting model ready
18:40:16-765123 INFO     Preparing for training
18:40:16-768121 INFO     Creating LoRA model
18:40:17-236766 INFO     Starting training
Training 'llama' model using (q, v, k) projections
Trainable params: 24,510,464 (2.1795 %), All params: 1,124,567,040 (Model: 1,100,056,576)
Monitoring loss (Auto-Stop at: 1.8)
18:40:17-260766 INFO     Log file 'train_dataset_sample.json' created in the 'logs' directory.
Exception in thread Thread-17 (threaded_run):
Traceback (most recent call last):
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "D:\AI\text-generation-webui-main\modules\training.py", line 705, in threaded_run
    trainer.train()
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\trainer.py", line 1859, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\trainer.py", line 2203, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\transformers\trainer.py", line 3147, in training_step
    self.accelerator.backward(loss)
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\accelerator.py", line 1964, in backward
    self.scaler.scale(loss).backward(**kwargs)
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\autograd\__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\autograd\function.py", line 289, in apply
    return user_fn(self, *args)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\utils\checkpoint.py", line 319, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\autograd\__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\autograd\function.py", line 289, in apply
    return user_fn(self, *args)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\bitsandbytes\autograd\_functions.py", line 474, in backward
    grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: struct c10::Half != float
18:40:22-747869 INFO     Training complete, saving
18:40:22-882405 INFO     Training complete!

System Info

win 11 quadro k2200
hypersniper05 commented 5 months ago

You have a space on the model name folder , you can't do that , the error code says it all.

Edit: nm I see it was at startup. Try using 4bit quant and if that doesn't work try bfloat16