Hi, I found this repository when I'm plan to impelment BitNet.
BitLinear use 1-bit, but, since pytorch native dtype does not supports 1bit tensor,
So i thought I need to implement via custom cuda kernel.(bit packing and unpacking)
How did u implement 1bit tensor implementation?
(I can't find cuda files yet)
Hi, I found this repository when I'm plan to impelment BitNet.
BitLinear use 1-bit, but, since pytorch native dtype does not supports 1bit tensor, So i thought I need to implement via custom cuda kernel.(bit packing and unpacking)
How did u implement 1bit tensor implementation? (I can't find cuda files yet)