oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.42k stars 5.3k forks source link

Multi-GPU cannot load transformers on a single card #6003

Open Urammar opened 5 months ago

Urammar commented 5 months ago

Describe the bug

This is a reproduction of #4193 This was never adequately fixed, or has regressed, it appears.

Is there an existing issue for this?

Reproduction

As above

Screenshot

No response

Logs

00:52:58-495385 INFO     Loading "TheBloke_TinyLlama-1.1B-1T-OpenOrca-GPTQ"
00:52:58-501386 INFO     Loading with disable_exllama=True and disable_exllamav2=False.
00:52:58-503387 INFO     TRANSFORMERS_PARAMS=
{   'low_cpu_mem_usage': True,
    'torch_dtype': torch.float16,
    'device_map': 'auto',
    'max_memory': {0: '20000MiB', 1: '0MiB', 'cpu': '15000MiB'},
    'quantization_config': GPTQConfig(quant_method=<QuantizationMethod.GPTQ: 'gptq'>)}

00:52:58-694430 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\modules\ui_model_menu.py", line 247, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\modules\models.py", line 94, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\modules\models.py", line 256, in huggingface_loader
    model = LoaderClass.from_pretrained(path_to_model, **params)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\installer_files\env\Lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\installer_files\env\Lib\site-packages\transformers\modeling_utils.py", line 3618, in from_pretrained
    max_memory = get_max_memory(max_memory)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 791, in get_max_memory
    max_memory[key] = convert_file_size_to_int(max_memory[key])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\MyShit\AI\oobabooga_windows\text-generation-webui2\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 129, in convert_file_size_to_int
    raise ValueError(err_msg)
ValueError: `size` 0MiB is not in a valid format. Use an integer for bytes, or a string with an unit (like '5.0GB').

System Info

1080ti, 3090ti
Urammar commented 5 months ago

The error is found and fixed in \installer_files\env\Lib\site-packages\accelerate\utils\modeling.py

Line 128

if mem_size <= 0:
        raise ValueError(err_msg)
    return mem_size

For multi-GPU this should be

if mem_size < 0:
        raise ValueError(err_msg)
    return mem_size

This fix is not entirely correct, as it permits setting 0 to all fields, which would attempt to load a model with no memory allocated at all, with no error message. As a temporary fix, however, it would permit multi-gpu setups to load models in transformers at all, so i'll take the janky win.

But this behavior needs to be updated to check for multi-gpus and only complain if no vram is set across all detected cards.