Discussion: Examine ternary parameters paper? (BitNet b1.58)

state-spaces / mamba

Mamba SSM architecture

Apache License 2.0

13.23k stars 1.13k forks source link

Discussion: Examine ternary parameters paper? (BitNet b1.58) #204

Closed fat-tire closed 8 months ago

fat-tire commented 8 months ago

/r/locallama and Hacker News etc are all buzzing today about this BitNet b1.58 paper that claims extraordinary gains in model size efficiency, energy usage, and inference speed with similar results as full-precision 16-bit weights... by using ternary values {-1, 0, 1} during training.

Didn't see anything mentioning BitNet b1.58 here yet, so... might Mamba-based LLMs also benefit from anything proposed in the paper, assuming it works as advertised?

tridao commented 8 months ago

That seems like a general technique and not particular to an architecture.

fat-tire commented 8 months ago

Yes, I only thought it might be relevant to its implementation (?), and given all the excitement, maybe something to play with in the mamba reference code. :shrug: Still, if it's not relevant or interesting, I'll close this (or feel free to do so!)

fat-tire commented 8 months ago

Closing to keep the issue queue down. Congrats on the amazing work!