lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch
MIT License
2.65k stars 215 forks source link

[New quantization method] Binary Spherical Quantization #143

Closed kabachuha closed 4 months ago

kabachuha commented 4 months ago

Binary Spherical Quantization is an extension of binarized Lookup-Free Quantization (from MagViT2), but they map the values not to a binarized hypercube, but on a sphere

https://arxiv.org/abs/2406.07548

The authors claim to have beaten MagViT2 and that their method offers better compression quality compared to the other commonly used methods. They tested image and image restoration and generation with a masked language model, and their results are quite good.

The code and some checkpoints are available at https://github.com/zhaoyue-zephyrus/bsq-vit, so I think it will be fast for you to add them to this repo and maybe add the residual- variants

lucidrains commented 4 months ago

@kabachuha hey that's cool! thank you and will look at it later today 🧐

lucidrains commented 4 months ago

@kabachuha oh i see, so just add an l2norm before the binary quantization it seems? i'll run an experiment later

edit: the first term of the entropy gets simplified too

kabachuha commented 4 months ago

Thanks! I'll test it too now :)

lucidrains commented 4 months ago

1 sec, there's a bug lol, be back soon (out with doggo)

kabachuha commented 4 months ago

Yes, at this moment it does look a bit washed out Training scheme(2)

Still, it seems to work

(The images are the original Stable Diffusion's VAE quantization and restoration)

lucidrains commented 4 months ago

@kabachuha try again on the latest!

lucidrains commented 4 months ago

@kabachuha was the training a lot smoother?

kabachuha commented 4 months ago

It's not training, it's just quantizing with no learning params :) (in fact, if you launch training with LFQ as is (setting up the training flag too), there will be an exception about the no params for the optimizer)

So not really observable to the naked eye

Training scheme(3)

The params here are:

        self.lfq = LFQ(
             codebook_size = 16,
             dim = 4,
             entropy_loss_weight = 0.1,
             diversity_gamma = 1.,
             num_codebooks=1,
             spherical = True
        )

Maybe they need some adjustments for the spherical version

I'd like to test the residual version too, it gave me much more better results on the LFQ

kabachuha commented 4 months ago

@lucidrains Testing with the residual version and the convergence is a bit worse so far

Training scheme(5)

lucidrains commented 4 months ago

@kabachuha think it is working quite well on the toy example!

kabachuha commented 4 months ago

It gave me higher reconstruction loss on the toy example 🤨

LFQ
rec loss: 0.115 | entropy aux loss: -0.035 | active %: 75.000

Spherical
rec loss: 0.128 | entropy aux loss: -0.092 | active %: 100.000

Anyway, it seems you updated the repo and now it has codebook scale 👀 Will update my tests now Still a bit grayish. It's time to sleep for me, so need to go.

Thank you for the great work, as usual! 🧡

lucidrains commented 4 months ago

@kabachuha oh dang, i was only looking at code utilization