Closed the-crypt-keeper closed 2 months ago
3B and 8B evaluations at FP16 and NF4 completed
Something might be wrong with the 20B: the FP16 throws a CUDA illegal memory access error when I load it across 4 GPUs and the NF4 performance is worse then 8B.
Going to stop here and not bother with the 34B, if you want to try this model use the 8B.
Update: The 20B and 34B models are a different architecture then 3B and 8B which likely explains the differences I'm seeing.
3b, 8b, 20b and 34b instruction following models just released