kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
653 stars 31 forks source link

Installation requirements #89

Open arthurv opened 5 days ago

arthurv commented 5 days ago

Hi,

I tried to install ktransformers on a clean install of Linux Mint 22 (based on Ubuntu 24.04), and there are a few things that I had to add:

pip instal numpy
pip install cpufeature
pip install flash_attn
conda install -c conda-forge libstdcxx-ng

Please update the pip dependencies.

Are there any plans to increase the number of quants supported for Deepseek-Coder-V2-Instruct-0724?

Azure-Tang commented 4 days ago

Okay, we will update these packages in our next release.

Regarding quantizations, we now support multiple quantization methods(Qx_k and IQ4_XS) . What else would you like?

arthurv commented 2 days ago

I have a system with 192GB DRAM and 48GB VRAM (2x 3090). Would it be able to handle 128k context with the specs? Would it be able to handle Q5_K_M or Q6_K_M?

Also I can only set max_new_tokens in local_chat, not the ktransformers server, and I can't set the total context size anywhere.