[New quantization method] Binary Spherical Quantization

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

MIT License

2.12k stars 179 forks source link

[New quantization method] Binary Spherical Quantization #143

Open kabachuha opened 4 days ago

kabachuha commented 4 days ago

Binary Spherical Quantization is an extension of binarized Lookup-Free Quantization (from MagViT2), but they map the values not to a binarized hypercube, but on a sphere

https://arxiv.org/abs/2406.07548

The authors claim to have beaten MagViT2 and that their method offers better compression quality compared to the other commonly used methods. They tested image and image restoration and generation with a masked language model, and their results are quite good.

The code and some checkpoints are available at https://github.com/zhaoyue-zephyrus/bsq-vit, so I think it will be fast for you to add them to this repo and maybe add the residual- variants

lucidrains commented 4 days ago

@kabachuha hey that's cool! thank you and will look at it later today 🧐

lucidrains commented 4 days ago

@kabachuha oh i see, so just add an l2norm before the binary quantization it seems? i'll run an experiment later

edit: the first term of the entropy gets simplified too

kabachuha commented 4 days ago

Thanks! I'll test it too now :)

lucidrains commented 4 days ago

1 sec, there's a bug lol, be back soon (out with doggo)

kabachuha commented 4 days ago

Yes, at this moment it does look a bit washed out Training scheme(2)

Still, it seems to work

(The images are the original Stable Diffusion's VAE quantization and restoration)

lucidrains commented 4 days ago

@kabachuha try again on the latest!

lucidrains commented 4 days ago

@kabachuha was the training a lot smoother?

kabachuha commented 4 days ago

It's not training, it's just quantizing with no learning params :) (in fact, if you launch training with LFQ as is (setting up the training flag too), there will be an exception about the no params for the optimizer)

So not really observable to the naked eye

Training scheme(3)

The params here are:

        self.lfq = LFQ(
             codebook_size = 16,
             dim = 4,
             entropy_loss_weight = 0.1,
             diversity_gamma = 1.,
             num_codebooks=1,
             spherical = True
        )

Maybe they need some adjustments for the spherical version

I'd like to test the residual version too, it gave me much more better results on the LFQ

kabachuha commented 4 days ago

@lucidrains Testing with the residual version and the convergence is a bit worse so far

Training scheme(5)

lucidrains commented 4 days ago

@kabachuha think it is working quite well on the toy example!

kabachuha commented 4 days ago

It gave me higher reconstruction loss on the toy example 🤨

LFQ
rec loss: 0.115 | entropy aux loss: -0.035 | active %: 75.000

Spherical
rec loss: 0.128 | entropy aux loss: -0.092 | active %: 100.000

Anyway, it seems you updated the repo and now it has codebook scale 👀 ~~Will update my tests now~~ Still a bit grayish. It's time to sleep for me, so need to go.

Thank you for the great work, as usual! 🧡

lucidrains commented 4 days ago

@kabachuha oh dang, i was only looking at code utilization