issues
search
kvcache-ai
/
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741
stars
39
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix(docs): fix broken link
#60
sammcj
closed
2 months ago
0
[fix] Fix readme datas
#58
Azure-Tang
closed
2 months ago
0
[feature] release 0.1.3
#57
UnicornChan
closed
2 months ago
0
Update README.md
#56
hyx1999
closed
2 months ago
0
Add a instruction for configuring CUDA_HOME and CUDA_PATH to the install section of README.md.
#54
hyx1999
closed
2 months ago
2
Support for Mistral-Large-Instruct-2407-GGUF ?
#53
LIUKAI0815
closed
2 months ago
2
Fix: None for load config
#52
UnicornChan
closed
3 months ago
0
[fix] f16 dequantize device ignored
#51
molamooo
closed
3 months ago
0
How to properly disable offloading MoE layers to CPU?
#50
molamooo
closed
3 months ago
5
More Efficient Layer Distribution for DeepSeek Coder v2 on Multiple GPUs and CPUs
#49
BGFGB
opened
3 months ago
4
[fix] Fix bugs about static cache and server param;
#48
Azure-Tang
closed
3 months ago
0
Can I run llama3.1 70b with rtx4090+64g ddr5 ram?
#47
codeMonkey-shin
opened
3 months ago
1
[ENHANCEMENT] improve GPU utilization for multi-GPU
#46
ELigoP
closed
3 months ago
1
Cannot run DeepSeek V2 Chat in server mode on 2 GPUs
#45
ELigoP
closed
3 months ago
1
CUDA error: No kernel image is available for execution on the device
#44
Forsworns
closed
3 months ago
2
Update install.sh
#43
RealLittleXian
closed
3 months ago
4
Mixtral-8x7B-v0.1 GGUF file error
#42
RealLittleXian
closed
3 months ago
1
[Update] Update README
#41
Azure-Tang
closed
3 months ago
0
[fix] fix broken link
#40
Azure-Tang
closed
3 months ago
0
[fix] fix broken link
#39
Azure-Tang
closed
3 months ago
0
[update] README
#38
Azure-Tang
closed
3 months ago
0
Ubuntu 24.04 GLIBCXX version fail
#37
ELigoP
closed
3 months ago
3
Release v0.1.2
#36
UnicornChan
closed
3 months ago
0
[update] Update readme; Add tutorial
#35
Azure-Tang
closed
3 months ago
0
[fix] format classes and files name
#34
Azure-Tang
closed
3 months ago
0
Unable to use the web interface
#33
xldistance
closed
3 months ago
0
ollama chat not realised
#32
xldistance
opened
3 months ago
2
using docker start api server can't set max_new_tokens
#31
goldenquant
opened
3 months ago
1
[feature] support q2_k & q3_k dequantize on gpu
#30
BITcyman
closed
3 months ago
0
Update task_queue.h
#29
Atream
closed
3 months ago
0
using docker got errors
#28
goldenquant
closed
3 months ago
3
[Feature] towards 0.1.2
#27
chenht2022
closed
3 months ago
0
[fix] linux and windows can all find CPUInfer in current Directory
#26
Atream
closed
3 months ago
0
Windows
#25
Atream
closed
3 months ago
0
Windows
#24
Atream
closed
3 months ago
0
Failed to build wheels by myself
#23
RoacherM
closed
3 months ago
1
error with ffn_down
#22
Eutenacity
closed
3 months ago
2
Add support to switch main GPU
#21
firmanmm
closed
3 months ago
7
If I want to run a linear layer with q4_k_m on cpu using lamafile, how to do it with your implement
#20
Eutenacity
closed
3 months ago
1
confused about the cpu memory.
#19
Eutenacity
closed
3 months ago
0
update docker.md to support docker pull image
#18
UnicornChan
closed
3 months ago
0
Feature support multi instruct and docker building
#17
UnicornChan
closed
3 months ago
0
q5_k_m is not supported?
#16
keyonzeng
closed
3 months ago
2
GPU support without fp16. Multi gpu support
#15
AlexBefest
closed
3 months ago
1
[feature] support for pypi install
#14
UnicornChan
closed
3 months ago
0
About 1M ctx models
#13
choyakawa
closed
2 months ago
1
Awesome project
#12
LysandreJik
opened
3 months ago
3
Is Flash Attention 2 Necessary for Qwen2Moe?
#11
cherrymorning
closed
3 months ago
1
[model support] Requesting support for Gemma 2
#10
sand-bit
closed
3 months ago
1
Could you provide a Docker image?
#9
goldenquant
closed
3 months ago
3
Previous
Next