`bfloat16` cannot utilize some codes

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

MIT License

2.12k stars 179 forks source link

`bfloat16` cannot utilize some codes #114

Closed AmitMY closed 3 months ago

AmitMY commented 3 months ago

When using FSQ with [8, 5, 5, 5] levels, and in pytorch-lightning specifying bfloat16 training, the codebook utilization scratches 50% from below, while when training with float32 it scratches 100%.

I don't know if there is any issue with the implementation or just a limitation with the FSQ, in any case I would guess that this library should force float32 for the quantization step.

Example:

torch.tensor([1000,1001,1002,1003], dtype=torch.bfloat16).to(torch.int32)

tensor([1000, 1000, 1000, 1004], dtype=torch.int32)

lucidrains commented 3 months ago

@AmitMY hey Amit! i put in a quick fix in 1.14.4

curious how well FSQ is performing for you otherwise. are you training an autoencoder?

AmitMY commented 3 months ago

Hi! Was waiting for some compute to try this, but actually it fails: (network is now BFloat16, input is cast as float)

RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16

FSQ is performing amazingly well for me. Basically 100% codebook util, and the autoencoder can predict the input very well. I did have to normalize my data, but once that was done it was smooth sailing.

lucidrains commented 3 months ago

@AmitMY besides code utilization, have you tried running it against regular VQ as an ablation / to compare?

AmitMY commented 3 months ago

I have only tried regular VQ in the beginning, saw that FSQ was better/more stable for my problem, and then scaled up data/model size - so no, for my current problem I did not fully compare FSQ and VQ

lucidrains commented 3 months ago

@AmitMY ah got it, no biggie. just curious

lucidrains commented 2 months ago

@AmitMY finally had the chance to train FSQ myself yesterday evening and wow, it works great! so much more stable than VQ