issues
search
ikawrakow
/
ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
MIT License
89
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AVX2/Zen4 horizontal sums
#57
ikawrakow
opened
1 month ago
0
BF16 support on Metal
#56
ikawrakow
closed
1 month ago
0
Improve Q5_0 performance on AVX2
#55
ikawrakow
closed
1 month ago
0
Improve Q4_0 and Q8_0 performance on AVX2/Zen4
#54
ikawrakow
closed
1 month ago
0
Quantization mixes tweaks
#53
ikawrakow
closed
1 month ago
0
Fix bug and D < 128 case for Q8_0 k-cache
#52
ikawrakow
closed
1 month ago
0
Quantized Flash Attention for all supported CPU platforms
#51
ikawrakow
closed
1 month ago
0
AVX2 Flash Attention 2
#50
ikawrakow
closed
1 month ago
0
ARM_NEON Flash Attention
#49
ikawrakow
closed
2 months ago
0
AVX2 Flash Attention
#48
ikawrakow
closed
2 months ago
0
iq2_tn: slightly better performance on AVX2
#47
ikawrakow
closed
2 months ago
0
IQ1_TN Metal implementation
#46
ikawrakow
closed
2 months ago
0
Add CUDA support for IQ1_TN
#45
ikawrakow
closed
2 months ago
0
Adding IQ1_TN - 1.6875 bpw for TriLM ternary models
#44
ikawrakow
closed
2 months ago
1
iq2_tn: slightly faster PP on Zen4
#43
ikawrakow
closed
2 months ago
0
Adding fused rms_norm
#42
ikawrakow
closed
2 months ago
0
iqk_mul_mat(ARM_NEON): adding bf16 support
#41
ikawrakow
closed
1 month ago
0
Adding bf16 support to CUDA
#40
ikawrakow
closed
1 month ago
0
Add support for bf16 to iqk_mul_mat
#39
ikawrakow
closed
2 months ago
0
Zen4 Flash Attention - bf16 support
#38
ikawrakow
closed
2 months ago
0
Performance improvements for legacy quants on ARM_NEON
#37
ikawrakow
closed
2 months ago
0
Zen4 Flash Attnetion 2
#36
ikawrakow
closed
2 months ago
0
Fix Zen4 Flash Attention
#35
ikawrakow
closed
2 months ago
0
Bug: FA fails when processing prompt lengths that are not a multiple of 8
#34
ikawrakow
closed
2 months ago
0
Do not process prompts containing binary data for escapes
#33
ikawrakow
closed
2 months ago
0
Zen4 Flash Attention
#32
ikawrakow
closed
2 months ago
0
Fix build when iqk_mul_mat is disabled
#31
ikawrakow
closed
2 months ago
0
Bug: Appcrash on Windows 7 with GGML_USE_IQK_MULMAT
#30
whoreson
closed
1 month ago
24
Bug: some ifdefs missing in ggml/src/iqk/iqk_quantize.cpp
#29
whoreson
closed
2 months ago
2
Binary KQ mask
#28
ikawrakow
opened
2 months ago
0
Faster Gemma2
#27
ikawrakow
closed
2 months ago
0
Feature Request: Improve CPU processing speed for large contexts
#26
ikawrakow
opened
2 months ago
0
softcap: minor improvement
#24
ikawrakow
closed
2 months ago
0
iq4_k tweak
#23
ikawrakow
closed
2 months ago
0
AVX2 quantization for Q8_K
#22
ikawrakow
closed
2 months ago
0
quantize_stats: print rmse and max error as fraction of <x>
#21
ikawrakow
closed
2 months ago
0
iq2_k: slightly better bpw - accuracy compromise
#20
ikawrakow
closed
2 months ago
0
Skip barriers of noops
#19
ikawrakow
closed
2 months ago
0
Merge mainline - Aug 12 2024
#17
ikawrakow
closed
2 months ago
0
Fix Makefile
#16
ikawrakow
closed
3 months ago
0
Adding IQ6_K
#14
ikawrakow
closed
3 months ago
0
Adding IQ2_TN for use with ternary models
#13
ikawrakow
closed
3 months ago
2
q2_K: allow it to detect ternary nets and quantize accordingly
#12
ikawrakow
closed
3 months ago
0
Faster iq3_k and iq5_k quantization
#11
ikawrakow
closed
3 months ago
0
iq4_k: speedup quantization by a factor of ~2
#10
ikawrakow
closed
3 months ago
0
Fused soft cap and SIMD-ified GeLU
#9
ikawrakow
closed
2 months ago
0
Adding IQ2_K, IQ3_K and IQ5_K
#7
ikawrakow
closed
3 months ago
0
IQ4_K: SOTA 4-bit quantization
#6
ikawrakow
closed
3 months ago
0
Fusing a mat mul op followed by a scale op on the CPU
#5
ikawrakow
opened
3 months ago
0
Simdify and multi-thread tanh
#4
ikawrakow
closed
3 months ago
0
Previous
Next