speechmatics / hqa

Code to accompany the paper "Hierarchical Quantized Autoencoders"
MIT License
37 stars 4 forks source link

CelebA reconstruction code #2

Open SURABHI-GUPTA opened 4 years ago

SURABHI-GUPTA commented 4 years ago

Hey. I am newbie in computer vision research. I want this code to try on face image reconstruction as you have mentioned in paper. I find it difficult to understand the paper in the first attempt. Any help or suggestions ?

jplhughes commented 4 years ago

Hi Surabhi,

Thanks for getting in touch and I’d be happy to help. Could you be more specific in the areas in which you are having trouble with? Is it with VQVAEs in general, how HQA builds on previous work or about how to run the code?

My suggestion would be to follow the README and get hqa.ipynb running as a first step before you move on to face reconstruction. This python notebook is completely self contained and applies HQA to the MNIST dataset. Then I would suggest going through the code in the HQA and VQCodebook class in more depth.

Best regards, John

SURABHI-GUPTA commented 4 years ago

Thanks for the quick reply. I would like to understand how HQA builds on previous work and also the changes in the code for CelebA dataset. Please let me know the changes in the code for RGB CelebA dataset

jplhughes commented 4 years ago

We recommend understanding VQVAE2 and HAMS.

To use RGB you just need to change the feature dimension from 1 to 3 and use an equivalent CelebA dataloader to MNIST. Other than that, the code should remain unchanged.

SURABHI-GUPTA commented 4 years ago

I changed the input feature dimension from 1 to 3 in train_full_stack function., but it gives error in show_recon function during training. Please help.

SURABHI-GUPTA commented 4 years ago

@McHughes288 I tried with LFW dataset, but there are many errors related to dimension, although I changed the input feature dimension from 1 to 3 in train_full_stack function. For more than 1 channels, do we need to change anything else ?

jplhughes commented 4 years ago

Ok, one thing you will have to do is change dim=(0, 1, 2) to dim=(0, 1, 2, 3). Just do a find and replace in the notebook.

skokalj commented 1 year ago

Hi, I am interested in how you calculated the rates in bits that you plotted in the paper. Is it based on the KL, as in the code (rate_bpd = KL/dims)? I see that KL is actually the entropy of the quantized encoding, so it gives bits per z_q; hence, you multiply the rate_bpd by the number of embedded symbols (product of z_q dimensions). Am I right?