kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
747 stars 39 forks source link

Native windows support #4

Closed DotNetDevlll closed 3 months ago

DotNetDevlll commented 4 months ago

First of all thanks for your AI community contribution, it's a huge leap forward to use MoE models for consumer grade users with limited VRAM. Would it be possible to add native Windows support to KTransformers? I'd love to see the project become accessible to windows users as well.

Thanks!

azywait commented 4 months ago

it be possible to add native Windows support to KTransformers? I'd love to see the project become accessible to windows users as well.

Thanks!

Thanks for your interest. Native Windows support is in our plans, but it may take some time. 😊

whisper-bye commented 3 months ago

Has anyone tried to run it under windows?

Atream commented 3 months ago

You can try to install from source by running install.bat. Pre-built wheels will be released soon.

whisper-bye commented 3 months ago

.\install.bat

Installing ktransformers
Processing c:\users\pc\ktransformers\ktransformers\ktransformers
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "C:\Users\pc\miniconda3\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "C:\Users\pc\miniconda3\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\pc\miniconda3\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 149, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\pc\miniconda3\Lib\site-packages\setuptools\build_meta.py", line 368, in prepare_metadata_for_build_wheel
          self.run_setup()
        File "C:\Users\pc\miniconda3\Lib\site-packages\setuptools\build_meta.py", line 313, in run_setup
          exec(code, locals())
        File "<string>", line 294, in <module>
        File "<string>", line 132, in get_package_version
        File "<string>", line 54, in get_cuda_bare_metal_version
      TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Installation completed successfully
UnicornChan commented 3 months ago

There might be a few possibilities here.

  1. CUDA is not installed on your machine.
  2. The CUDA environment variables are not active.
  3. The installed PyTorch is not for GPU, but for CPU.

There is a simple env check code

import torch
import subprocess
from torch.utils.cpp_extension import CUDA_HOME
print("torch version is: " + str(torch.__version__))
print("CUDA HOME is: " + str(CUDA_HOME))
raw_output = subprocess.check_output([str(CUDA_HOME) + "/bin/nvcc", "-V"], universal_newlines=True)
print("nvcc version is : " + raw_output)

The output of my computer is:

torch version is: 2.4.0+cu124 CUDA HOME is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA\v12.5 nvcc version is : nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Apr_17_19:36:51_Pacific_Daylight_Time_2024 Cuda compilation tools, release 12.5, V12.5.40 Build cuda_12.5.r12.5/compiler.34177558_0

Could you show me what the output is on your computer?

whisper-bye commented 3 months ago

torch version is: 2.4.0+cpu CUDA HOME is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6 nvcc version is : nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Fri_Jun_14_16:44:19_Pacific_Daylight_Time_2024 Cuda compilation tools, release 12.6, V12.6.20 Build cuda_12.6.r12.6/compiler.34431801_0

thanks for your quick reply, after fixup some torch things ...

python -m ktransformers.local_chat --model_name Qwen/Qwen2-57B-A14B-Instruct --gguf_path ./Qwen2-57B-GGUF
ERROR: The function received no value for the required argument: model_path
Usage: local_chat.py MODEL_PATH <flags>
  optional flags:        --optimize_rule_path | --gguf_path |
                         --max_new_tokens | --cpu_infer

For detailed information on this command, run:
  local_chat.py --help
UnicornChan commented 3 months ago

torch version is: 2.4.0+cpu CUDA HOME is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6 nvcc version is : nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Fri_Jun_14_16:44:19_Pacific_Daylight_Time_2024 Cuda compilation tools, release 12.6, V12.6.20 Build cuda_12.6.r12.6/compiler.34431801_0

thanks for your quick reply, after fixup some torch things ...

python -m ktransformers.local_chat --model_name Qwen/Qwen2-57B-A14B-Instruct --gguf_path ./Qwen2-57B-GGUF
ERROR: The function received no value for the required argument: model_path
Usage: local_chat.py MODEL_PATH <flags>
  optional flags:        --optimize_rule_path | --gguf_path |
                         --max_new_tokens | --cpu_infer

For detailed information on this command, run:
  local_chat.py --help

Perhaps you can input --model_path instead of --model_name.

python -m ktransformers.local_chat --model_path Qwen/Qwen2-57B-A14B-Instruct --gguf_path ./Qwen2-57B-GGUF
whisper-bye commented 3 months ago

Great, it works! i3900k+msi 4090 + ram 96GB gpu mem 6.3/24GB ram 35GB maybe

Chat: 数字9.11和9.9谁大? 
数字9.11和9.9中,9.11比9.9小。这可以通过将两个数字转换为分数来更清楚地看到,其中9.11为911/1000和9.9为99/10。比较两个分数,911/1000小于99/10,因此9.11小于9.9。
prompt eval count:    31 token(s)
prompt eval duration: 0.757000207901001s
prompt eval rate:     40.95111160663528 tokens/s
eval count:           90 token(s)
eval duration:        6.651063442230225s
eval rate:            13.531670654132467 tokens/s