Closed Lucky-Lance closed 7 months ago
Hi @Lucky-Lance ! The CUDA version only supports bit-unpacking. So you can use the packing functions from https://github.com/mobiusml/hqq/blob/master/hqq/core/bitpack.py and use bit-unpacking from the CUDA extension https://github.com/mobiusml/hqq/blob/master/hqq/kernels/hqq_aten_cuda.cpp#L51
from hqq.core.bitpack import BitPack
import hqq_aten
W_packed = Bitpack.pack_2bit_u8(W)
W_unpacked = hqq_aten.unpack_2bit_u8(W_packed)
OK , thanks a lot!
Hello, I am very impressed with your great work. I am not quite familiar with CUDA programming. Would you please kindly give me an instruction about how to call the pack_2bit_u8 of your optimized CUDA (C++) version? I just need to pack and unpack the weights, without quantizing them. Thanks!