kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741 stars 39 forks source link

error with ffn_down #22

Closed Eutenacity closed 3 months ago

Eutenacity commented 3 months ago

I try to load ffn_weight and run it with cpuinfer_ext.linear. ffn_up and ffn_gate are good. But ffn_down result in nan. What is wrong code is below


from ktransformers.util.custom_gguf import GGUFLoader
import cpuinfer_ext
import torch
import torch.nn.functional as F
import psutil
import os
import ctypes
gguf_path = "/VM/share/models/Mixtral-8x7b-q4k_m/"
gguf_loader=GGUFLoader(gguf_path)
# print(gguf_loader.tensor_info)
tensor = gguf_loader.get_mmap_tensor("blk.0.ffn_down.0.weight")
print(tensor)
a = torch.tensor(tensor,dtype= torch.uint8)
print(a.nbytes)
print(a.shape)
tensor_fp32 = gguf_loader.load_gguf_tensor("blk.0.ffn_down.0.weight").to(torch.bfloat16)
print(tensor_fp32.shape)
gate_ptr = ctypes.addressof(
            ctypes.cast(tensor.ctypes.data, ctypes.POINTER(ctypes.c_uint64)).contents
        )
output_size =  4096
input_size = 14336
stride = 16
proj_type=12
hidden_type=30
CPUInfer = cpuinfer_ext.CPUInfer(32)
config = cpuinfer_ext.linear.LinearConfig(input_size, output_size, stride, gate_ptr, proj_type, hidden_type)
linear = cpuinfer_ext.linear.Linear(config)

input = torch.randn((2, input_size), dtype=torch.bfloat16).contiguous()
output = torch.zeros((2, output_size), dtype=torch.bfloat16).contiguous()
for i in range(2):
    CPUInfer.submit(linear.forward, input[i,:].data_ptr(), output[i,:].data_ptr())
CPUInfer.sync()
print(output)
out = F.linear(input,tensor_fp32)
print(out)
chenht2022 commented 3 months ago

I ran your script and got the following results, which seem to be fine.

(base) chenht@sapphire2:~/ktransformers-dev$ python test.py 
[151   3 227 ...   8 169 135]
33030144
torch.Size([33030144])
torch.Size([4096, 14336])
tensor([[ 0.2041, -3.1406, -0.9922,  ..., -1.1406,  0.3828,  0.4023],
        [-1.7188,  0.3164,  0.5352,  ...,  1.5000,  1.5234,  1.0547]],
       dtype=torch.bfloat16)
tensor([[ 0.2070, -3.1562, -0.9883,  ..., -1.1484,  0.3828,  0.3984],
        [-1.7109,  0.3262,  0.5469,  ...,  1.5000,  1.5156,  1.0547]],
       dtype=torch.bfloat16)

Maybe you should check if your gguf file is intact and if the ffn_down weights are indeed q4_k quantized. If you confirm that these two aspects are fine, please provide more information (e.g., use lscpu to show your CPU instruction set) to help us fix it.

Eutenacity commented 3 months ago

Many thanks for your quick response. I found the reason. The gguf I download has a different ggml_type down ffn compared to up ffn.