2:4 quantized run is waiting on the GPTQ UX changes to merge, currently the originally sparsity is not respected by GPTQ. So far I've just confirmed the script completes with a 1.1b model.
Waiting on group quantization correctness fixes before validating a 7b grouped example
Creating a new examples folder, with initial examples for llama7b using ultrachat200k
Results
Models
Storing the model outputs on network under
/network/sadkins
Eval Results
zoo:llama2-7b-ultrachat200k_llama2_pretrain-base
(baseline): 10.10llama7b_w4a16_channel_compressed
: 11.09llama7b_w8a8_channel_dynamic_compressed
: 10.17Missing