Closed jrudolph closed 1 year ago
Combination of q8 quantization (#7) and AVX2 optimizations (#9).
Runs llama-7b with ~2 tokens per second (q8 alone: 0.45 tokens per second)
Combination of q8 quantization (#7) and AVX2 optimizations (#9).
Runs llama-7b with ~2 tokens per second (q8 alone: 0.45 tokens per second)