ikawrakow ik_llama.cpp issues

ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance

MIT License

89 stars 6 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Faster MoE inference

#112 ikawrakow closed 1 week ago
0
Use fused mul - unary op also for MoE models

#111 ikawrakow closed 2 weeks ago
0
Bitnet: use the fused mul-silu in the FFN network

#110 ikawrakow closed 2 weeks ago
0
Bitnet CUDA improvements

#109 ikawrakow closed 2 weeks ago
0
Another Bitnet performance improvement on Metal

#108 ikawrakow closed 2 weeks ago
0
Faster IQ1_BN Metal implementation

#107 ikawrakow closed 2 weeks ago
0
Bitnet changes

#106 ikawrakow closed 2 weeks ago
0
Fix quantized k-cache without FA

#105 ikawrakow closed 2 weeks ago
0
Bug: K cache without FA

#103 Nexesenex closed 2 weeks ago
10
Add support for Granite and GraniteMoE models

#102 ikawrakow closed 3 weeks ago
0
Enable q6_0 in flash attention

#101 ikawrakow closed 3 weeks ago
1
Enable IQ4_NL for KV-cache in token generation using Flash Attention

#99 ikawrakow closed 3 weeks ago
3
Avoid rebuild of GGML graph for each token

#98 agray3 closed 3 weeks ago
1
Bitnet: make the scale tensors optional

#97 ikawrakow closed 3 weeks ago
0
Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S

#96 Nexesenex closed 3 weeks ago
3
Adding @agray3's graph caching approach

#94 ikawrakow closed 3 weeks ago
6
Attempt to blindly fix Windows build failure

#93 ikawrakow closed 3 weeks ago
1
Bug: Quantized KV cache produces garbage in situation where llama.cpp does not

#92 saood06 opened 3 weeks ago
22
CLI - Specify GGML_TYPE to quantize for the main tensors.

#91 Nexesenex closed 3 weeks ago
0
iq4_ks: faster dot product on Metal

#90 ikawrakow closed 3 weeks ago
0
Adding IQ4_KSS: 4.0 bpw quants

#89 ikawrakow closed 3 weeks ago
2
Bug: Won't compile on MSVC

#88 saood06 closed 3 weeks ago
3
iq3_k: fix and optimize Metal dot product

#87 ikawrakow closed 4 weeks ago
0
Fix and optimize iq2k Metal implementation

#86 ikawrakow closed 1 month ago
0
IQ2_KS: 2.1875 bpw non-linear quantization

#85 ikawrakow closed 1 month ago
0
Better model info

#84 ikawrakow closed 1 month ago
0
New SOTA quantization: 4.25 bpw IQ4_KS

#83 ikawrakow closed 1 month ago
0
Cleanup scale fudge factors

#81 ikawrakow closed 1 month ago
0
Move to c++17 projectwide

#80 ikawrakow closed 1 month ago
0
Do not quantize activations if not necessary

#79 ikawrakow closed 1 month ago
0
q6_0: Slightly faster Zen4/AVX2

#78 ikawrakow closed 1 month ago
0
Adding Q6_0

#77 ikawrakow closed 1 month ago
1
iq4_nl: faster quantization

#76 ikawrakow closed 1 month ago
0
Fix Q5_0 flash attention

#75 ikawrakow closed 1 month ago
0
IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON)

#74 ikawrakow closed 1 month ago
0
CUDA: faster float -> iq4_nl conversion

#73 ikawrakow closed 1 month ago
0
iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2

#72 ikawrakow closed 1 month ago
0
iqk_mul_mat: better srategy when nrc_y not divisible by ny

#71 ikawrakow closed 1 month ago
0
Fused unary(x)*y

#70 ikawrakow closed 1 month ago
0
Allow bf16 kv-cache

#69 ikawrakow closed 1 month ago
0
It is time to fix replace_all

#68 ikawrakow closed 1 month ago
0
Feature Request: Elliminate/reduce unnecessary copies

#67 ikawrakow opened 1 month ago
0
CUDA non-contiguous RoPE

#66 ikawrakow closed 1 month ago
1
Adding SWIGLU unary op

#65 ikawrakow closed 1 month ago
1
Better sub-3-bit quantization mixes with a qkv tensor

#64 ikawrakow closed 1 month ago
0
Use fp32 for K*Q in Metal FA implementation

#62 ikawrakow closed 1 month ago
0
Adding ability to have meta data per tensor row

#61 ikawrakow closed 1 month ago
0
Bug: Illegal instruction on NEON and Q4_0_4_4

#60 whoreson closed 1 month ago
1
Bug: GGML Compilation Error: undefined references to `iqk_mul_mat'

#59 ndavidson19 closed 1 month ago
4
Fix compiler warnings

#58 ikawrakow closed 1 month ago
0