Closed 152334H closed 5 months ago
I've been working on that model for a bit, and it appears to be a broken conversion, though I can't figure out what's broken about it. Yet. FP16 inference on wikitext gives really high perplexity even before quantizing.
The error during quantization is because it catastrophically fails to quantize and then reconstruct the first layer, which suggests out-of-bounds values or some such, maybe even in the RMS norm weights or the embeddings. I'm looking into it.
my suspicion is that the conversion from fp16 gguf -> fp16 pytorch is broken on the attn weights specifically, due to unaddressed permutations from the pytorch -> gguf conversion.
i don't think the rest of the weights can be bad. Especially e.g. the embeddings have to be shaped correct because the pytorch model still generates semi-relevant text given an input prompt.
It turns out the weights were indeed broken, and there are now correct FP16 (and EXL2) versions on HF.
Observed when attempting to quantize alpindale/miqu-1-70b-fp16
The output directory is like this:
When retrying to resume:
This is caused by this line of exllamav2. I'm not sure why it happens.