issues
search
ikawrakow
/
ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
MIT License
89
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Faster MoE inference
#112
ikawrakow
closed
1 week ago
0
Use fused mul - unary op also for MoE models
#111
ikawrakow
closed
2 weeks ago
0
Bitnet: use the fused mul-silu in the FFN network
#110
ikawrakow
closed
2 weeks ago
0
Bitnet CUDA improvements
#109
ikawrakow
closed
2 weeks ago
0
Another Bitnet performance improvement on Metal
#108
ikawrakow
closed
2 weeks ago
0
Faster IQ1_BN Metal implementation
#107
ikawrakow
closed
2 weeks ago
0
Bitnet changes
#106
ikawrakow
closed
2 weeks ago
0
Fix quantized k-cache without FA
#105
ikawrakow
closed
2 weeks ago
0
Bug: K cache without FA
#103
Nexesenex
closed
2 weeks ago
10
Add support for Granite and GraniteMoE models
#102
ikawrakow
closed
3 weeks ago
0
Enable q6_0 in flash attention
#101
ikawrakow
closed
3 weeks ago
1
Enable IQ4_NL for KV-cache in token generation using Flash Attention
#99
ikawrakow
closed
3 weeks ago
3
Avoid rebuild of GGML graph for each token
#98
agray3
closed
3 weeks ago
1
Bitnet: make the scale tensors optional
#97
ikawrakow
closed
3 weeks ago
0
Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S
#96
Nexesenex
closed
3 weeks ago
3
Adding @agray3's graph caching approach
#94
ikawrakow
closed
3 weeks ago
6
Attempt to blindly fix Windows build failure
#93
ikawrakow
closed
3 weeks ago
1
Bug: Quantized KV cache produces garbage in situation where llama.cpp does not
#92
saood06
opened
3 weeks ago
22
CLI - Specify GGML_TYPE to quantize for the main tensors.
#91
Nexesenex
closed
3 weeks ago
0
iq4_ks: faster dot product on Metal
#90
ikawrakow
closed
3 weeks ago
0
Adding IQ4_KSS: 4.0 bpw quants
#89
ikawrakow
closed
3 weeks ago
2
Bug: Won't compile on MSVC
#88
saood06
closed
3 weeks ago
3
iq3_k: fix and optimize Metal dot product
#87
ikawrakow
closed
4 weeks ago
0
Fix and optimize iq2k Metal implementation
#86
ikawrakow
closed
1 month ago
0
IQ2_KS: 2.1875 bpw non-linear quantization
#85
ikawrakow
closed
1 month ago
0
Better model info
#84
ikawrakow
closed
1 month ago
0
New SOTA quantization: 4.25 bpw IQ4_KS
#83
ikawrakow
closed
1 month ago
0
Cleanup scale fudge factors
#81
ikawrakow
closed
1 month ago
0
Move to c++17 projectwide
#80
ikawrakow
closed
1 month ago
0
Do not quantize activations if not necessary
#79
ikawrakow
closed
1 month ago
0
q6_0: Slightly faster Zen4/AVX2
#78
ikawrakow
closed
1 month ago
0
Adding Q6_0
#77
ikawrakow
closed
1 month ago
1
iq4_nl: faster quantization
#76
ikawrakow
closed
1 month ago
0
Fix Q5_0 flash attention
#75
ikawrakow
closed
1 month ago
0
IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON)
#74
ikawrakow
closed
1 month ago
0
CUDA: faster float -> iq4_nl conversion
#73
ikawrakow
closed
1 month ago
0
iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2
#72
ikawrakow
closed
1 month ago
0
iqk_mul_mat: better srategy when nrc_y not divisible by ny
#71
ikawrakow
closed
1 month ago
0
Fused unary(x)*y
#70
ikawrakow
closed
1 month ago
0
Allow bf16 kv-cache
#69
ikawrakow
closed
1 month ago
0
It is time to fix replace_all
#68
ikawrakow
closed
1 month ago
0
Feature Request: Elliminate/reduce unnecessary copies
#67
ikawrakow
opened
1 month ago
0
CUDA non-contiguous RoPE
#66
ikawrakow
closed
1 month ago
1
Adding SWIGLU unary op
#65
ikawrakow
closed
1 month ago
1
Better sub-3-bit quantization mixes with a qkv tensor
#64
ikawrakow
closed
1 month ago
0
Use fp32 for K*Q in Metal FA implementation
#62
ikawrakow
closed
1 month ago
0
Adding ability to have meta data per tensor row
#61
ikawrakow
closed
1 month ago
0
Bug: Illegal instruction on NEON and Q4_0_4_4
#60
whoreson
closed
1 month ago
1
Bug: GGML Compilation Error: undefined references to `iqk_mul_mat'
#59
ndavidson19
closed
1 month ago
4
Fix compiler warnings
#58
ikawrakow
closed
1 month ago
0
Next