kvcache-ai ktransformers issues

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Apache License 2.0

741 stars 39 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

fix(docs): fix broken link

#60 sammcj closed 2 months ago
0
[fix] Fix readme datas

#58 Azure-Tang closed 2 months ago
0
[feature] release 0.1.3

#57 UnicornChan closed 2 months ago
0
Update README.md

#56 hyx1999 closed 2 months ago
0
Add a instruction for configuring CUDA_HOME and CUDA_PATH to the install section of README.md.

#54 hyx1999 closed 2 months ago
2
Support for Mistral-Large-Instruct-2407-GGUF ？

#53 LIUKAI0815 closed 2 months ago
2
Fix: None for load config

#52 UnicornChan closed 3 months ago
0
[fix] f16 dequantize device ignored

#51 molamooo closed 3 months ago
0
How to properly disable offloading MoE layers to CPU?

#50 molamooo closed 3 months ago
5
More Efficient Layer Distribution for DeepSeek Coder v2 on Multiple GPUs and CPUs

#49 BGFGB opened 3 months ago
4
[fix] Fix bugs about static cache and server param;

#48 Azure-Tang closed 3 months ago
0
Can I run llama3.1 70b with rtx4090+64g ddr5 ram?

#47 codeMonkey-shin opened 3 months ago
1
[ENHANCEMENT] improve GPU utilization for multi-GPU

#46 ELigoP closed 3 months ago
1
Cannot run DeepSeek V2 Chat in server mode on 2 GPUs

#45 ELigoP closed 3 months ago
1
CUDA error: No kernel image is available for execution on the device

#44 Forsworns closed 3 months ago
2
Update install.sh

#43 RealLittleXian closed 3 months ago
4
Mixtral-8x7B-v0.1 GGUF file error

#42 RealLittleXian closed 3 months ago
1
[Update] Update README

#41 Azure-Tang closed 3 months ago
0
[fix] fix broken link

#40 Azure-Tang closed 3 months ago
0
[fix] fix broken link

#39 Azure-Tang closed 3 months ago
0
[update] README

#38 Azure-Tang closed 3 months ago
0
Ubuntu 24.04 GLIBCXX version fail

#37 ELigoP closed 3 months ago
3
Release v0.1.2

#36 UnicornChan closed 3 months ago
0
[update] Update readme; Add tutorial

#35 Azure-Tang closed 3 months ago
0
[fix] format classes and files name

#34 Azure-Tang closed 3 months ago
0
Unable to use the web interface

#33 xldistance closed 3 months ago
0
ollama chat not realised

#32 xldistance opened 3 months ago
2
using docker start api server can't set max_new_tokens

#31 goldenquant opened 3 months ago
1
[feature] support q2_k & q3_k dequantize on gpu

#30 BITcyman closed 3 months ago
0
Update task_queue.h

#29 Atream closed 3 months ago
0
using docker got errors

#28 goldenquant closed 3 months ago
3
[Feature] towards 0.1.2

#27 chenht2022 closed 3 months ago
0
[fix] linux and windows can all find CPUInfer in current Directory

#26 Atream closed 3 months ago
0
Windows

#25 Atream closed 3 months ago
0
Windows

#24 Atream closed 3 months ago
0
Failed to build wheels by myself

#23 RoacherM closed 3 months ago
1
error with ffn_down

#22 Eutenacity closed 3 months ago
2
Add support to switch main GPU

#21 firmanmm closed 3 months ago
7
If I want to run a linear layer with q4_k_m on cpu using lamafile, how to do it with your implement

#20 Eutenacity closed 3 months ago
1
confused about the cpu memory.

#19 Eutenacity closed 3 months ago
0
update docker.md to support docker pull image

#18 UnicornChan closed 3 months ago
0
Feature support multi instruct and docker building

#17 UnicornChan closed 3 months ago
0
q5_k_m is not supported?

#16 keyonzeng closed 3 months ago
2
GPU support without fp16. Multi gpu support

#15 AlexBefest closed 3 months ago
1
[feature] support for pypi install

#14 UnicornChan closed 3 months ago
0
About 1M ctx models

#13 choyakawa closed 2 months ago
1
Awesome project

#12 LysandreJik opened 3 months ago
3
Is Flash Attention 2 Necessary for Qwen2Moe?

#11 cherrymorning closed 3 months ago
1
[model support] Requesting support for Gemma 2

#10 sand-bit closed 3 months ago
1
Could you provide a Docker image?

#9 goldenquant closed 3 months ago
3

Previous Next