Closed Eutenacity closed 3 months ago
I ran your script and got the following results, which seem to be fine.
(base) chenht@sapphire2:~/ktransformers-dev$ python test.py
[151 3 227 ... 8 169 135]
33030144
torch.Size([33030144])
torch.Size([4096, 14336])
tensor([[ 0.2041, -3.1406, -0.9922, ..., -1.1406, 0.3828, 0.4023],
[-1.7188, 0.3164, 0.5352, ..., 1.5000, 1.5234, 1.0547]],
dtype=torch.bfloat16)
tensor([[ 0.2070, -3.1562, -0.9883, ..., -1.1484, 0.3828, 0.3984],
[-1.7109, 0.3262, 0.5469, ..., 1.5000, 1.5156, 1.0547]],
dtype=torch.bfloat16)
Maybe you should check if your gguf file is intact and if the ffn_down weights are indeed q4_k quantized.
If you confirm that these two aspects are fine, please provide more information (e.g., use lscpu
to show your CPU instruction set) to help us fix it.
Many thanks for your quick response. I found the reason. The gguf I download has a different ggml_type down ffn compared to up ffn.
I try to load ffn_weight and run it with cpuinfer_ext.linear. ffn_up and ffn_gate are good. But ffn_down result in nan. What is wrong code is below