Open CoffeeVampir3 opened 2 months ago
Very cool! Thank you for writing up such a clear plan. We can start merging the bit-packing logic and the layer quantization so feel free to send a PR whenever you're ready. This is very much following a similar to the playbook for fp6 @gau-nernst followed
Related work
:+1: https://github.com/pytorch/ao/pull/285 I've kept this seperate from Andreas' commit for now, this encapsulates only the working bits.
Bitnet 1.58 Groundwork
After some talks with Saroufim and the cuda mode team working on bitnet, we've outlined a strategy for implementing bitnet 1.58 method into torch. This issue lays the groundwork for 2-bit trinary tensor quantization and bitnet linear work for Bitnet 1.58
I've set up a staging repo Staging with a number of items:
This covers the initial groundwork for getting working trinary networks into torch.