lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch
MIT License
2.65k stars 215 forks source link

Implement the rotation trick. #164

Closed cfifty closed 1 month ago

cfifty commented 1 month ago

The rotation trick is a new way to propagate gradients through vector quantization layers [different from the STE estimate].

See https://arxiv.org/abs/2410.06424

As an aside, this repository was quite helpful for the experiments in that paper -- thank you.

lucidrains commented 1 month ago

@cfifty hi Chris, thank you for this pull request and for the kind words

I just ran your STE variant with rotation through the fashionmnist example... and saw code utilization go from ~25% (non-rotation) to 100% ... congratulations on this finding and for sharing this paper!

lucidrains commented 1 month ago

@cfifty just when i was about to turn to scalar quantization and not look back :rofl:

lucidrains commented 1 month ago

@cfifty also, go big red :smile:

cfifty commented 1 month ago

Yooo! @lucidrains I had no idea! Always great to meet / interact w/ another Cornellian in Silicon Valley :) Surprisingly high concentration of us out here (++ Chris Ré --last author on that paper & Stanford Prof--was also a Math Cornell ugrad)

lucidrains commented 1 month ago

haha, i know Chris Ré of course ever since the flash attention paper, but didn't know he is a fellow alum!

must be confusing to have an advisor with the same first name lol (just noticed that)

congrats again on this paper! could be significant!

cfifty commented 1 month ago

Thanks Phil :)