lxe / simple-llm-finetuner

Simple UI for LLM Model Finetuning
MIT License
2.05k stars 132 forks source link

Question: Native windows support #23

Closed Paillat-dev closed 7 months ago

Paillat-dev commented 1 year ago

Followed instructions in the readme, but getting AssertionError: Torch not compiled with CUDA enabled Running on nvidia A4500, native windows (not wsl)

Traceback

(llama-finetuner) D:\simple-llama-finetuner>python main.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading base model...
Traceback (most recent call last):
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\gradio\blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\gradio\blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
    response = fn(*args)
  File "D:\simple-llama-finetuner\main.py", line 128, in tokenize_and_train
    if (model is None): load_base_model()
  File "D:\simple-llama-finetuner\main.py", line 18, in load_base_model
    model = transformers.LlamaForCausalLM.from_pretrained(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\transformers\modeling_utils.py", line 2643, in from_pretrained
    ) = cls._load_pretrained_model(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\transformers\modeling_utils.py", line 2966, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\transformers\modeling_utils.py", line 673, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\transformers\utils\bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\bitsandbytes\nn\modules.py", line 196, in to
    return self.cuda(device)
  File "C:\Users\jerem\.conda\envs\llama-finetuner\lib\site-packages\bitsandbytes\nn\modules.py", line 159, in cuda
    B = self.data.contiguous().half().cuda(device)
  File "C:\Users\jerem\AppData\Roaming\Python\Python310\site-packages\torch\cuda\__init__.py", line 221, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Paillat-dev commented 1 year ago

Ok. Just noticed it needs wsl or linux. Shame. Is it going to be supported?

lxe commented 1 year ago

You can run this on windows natively. You'll need to hack bitsandbytes to load a precompiled dll though.

https://github.com/james-things/bitsandbytes-prebuilt-all_arch/tree/main

Paillat-dev commented 7 months ago

No need anymore