Closed kabachuha closed 4 months ago
@kabachuha hey that's cool! thank you and will look at it later today 🧐
@kabachuha oh i see, so just add an l2norm before the binary quantization it seems? i'll run an experiment later
edit: the first term of the entropy gets simplified too
Thanks! I'll test it too now :)
1 sec, there's a bug lol, be back soon (out with doggo)
Yes, at this moment it does look a bit washed out
Still, it seems to work
(The images are the original Stable Diffusion's VAE quantization and restoration)
@kabachuha try again on the latest!
@kabachuha was the training a lot smoother?
It's not training, it's just quantizing with no learning params :) (in fact, if you launch training with LFQ as is (setting up the training flag too), there will be an exception about the no params for the optimizer)
So not really observable to the naked eye
The params here are:
self.lfq = LFQ(
codebook_size = 16,
dim = 4,
entropy_loss_weight = 0.1,
diversity_gamma = 1.,
num_codebooks=1,
spherical = True
)
Maybe they need some adjustments for the spherical version
I'd like to test the residual version too, it gave me much more better results on the LFQ
@lucidrains Testing with the residual version and the convergence is a bit worse so far
@kabachuha think it is working quite well on the toy example!
It gave me higher reconstruction loss on the toy example 🤨
LFQ
rec loss: 0.115 | entropy aux loss: -0.035 | active %: 75.000
Spherical
rec loss: 0.128 | entropy aux loss: -0.092 | active %: 100.000
Anyway, it seems you updated the repo and now it has codebook scale 👀
Will update my tests now Still a bit grayish. It's time to sleep for me, so need to go.
Thank you for the great work, as usual! 🧡
@kabachuha oh dang, i was only looking at code utilization
Binary Spherical Quantization is an extension of binarized Lookup-Free Quantization (from MagViT2), but they map the values not to a binarized hypercube, but on a sphere
https://arxiv.org/abs/2406.07548
The authors claim to have beaten MagViT2 and that their method offers better compression quality compared to the other commonly used methods. They tested image and image restoration and generation with a masked language model, and their results are quite good.
The code and some checkpoints are available at https://github.com/zhaoyue-zephyrus/bsq-vit, so I think it will be fast for you to add them to this repo and maybe add the residual- variants