hific / hific.github.io

http://hific.github.io
21 stars 1 forks source link

Question about the paper #1

Closed CreatorZZY closed 4 years ago

CreatorZZY commented 4 years ago

I am sorry but I am confused about what z of p(y|z) means? I see the explanation of z is “side information”......confused. OHmL.jpg

fab-jul commented 4 years ago

Hi, this is based on the hyper-prior architecture from Balle et al 1. The idea is that during encoding, we extract two latents, z and y. Then, we use z to predict a distribution p(y|z) which we use to encode y with entropy coding. Because we need the same distribution during decoding, we have to transmit z to the decoder, which we do by losslessly compressing it with it's own predicted distribution, for which we use the factorized entropy model introduced in 1. If this is too abstract, let me know if you have further questions!

CreatorZZY commented 4 years ago

I just read the paper about \<hyper-prior architecture>. Did you mean that z help Encoder and Decoder to Compress and Uncompress y in a properly distubution(Unlike what Rude VAE does)? But how p(y|z) interact with the Model? (In \<hyper-prior architecture>, p(y|z) is the input of the decoder while SubModel P is just a independent branch. )

fab-jul commented 4 years ago

Here, y is also the input to the decoder. p(y|z) is only used to encode y losslessly. y represents the image.

CreatorZZY commented 4 years ago

Yeah, that is what i think and what the paper do i know. But actually you just use y to train the network, not p(y|z) that compressable losslessly. p(y|z) is different with y and I gress that in Production Environment only the Submodel G will be Used and only vector y are used to represent the origin image, which means that only y, uncompressable vector can decoder to the origin image while p(y|z) can't. In the paper about \<hyper-prior architecture>, they train the network with p(y|z) so they can decoder the p(y|z) to the origin image. So i think it should use submodel P's output to train the G and D, not the output of E.

And there are another question: In the paper about \<hyper-prior architecture>, They didn't use the origin data: x but the represent vector y as the side information to input to the Autoencoder. Can i explain that: they Forcibly separate the entropy of p(y) to entropy(yHand) and entropy(z), and then just decode information of yHand into the origin image. The model just add the pressure of reducing the entropy Manually.?

Here, y is also the input to the decoder. p(y|z) is only used to encode y losslessly. y represents the image.

fab-jul commented 4 years ago

Hi, I'm not sure I follow. Let me unwrap what we do "in production"

Encoding the image:

  1. feed image through encoder E, get y, and feed it through the "hyper encoder", which is the encoder network of P, to get z.
  2. now, encode z losslessly to disk with the factorized prior (you can think of that as a histogram, or just think of doing np.save(z)).
  3. now, feed z through the "hyper decoder", which is the decoder network of P, and get p(y|z), which is a distribution for each point in y (so if z is CxHxW, we get C * H * W different distributions from the "hyper decoder". Each distribution is a Gaussian, parametrized by a mean and a variance).
  4. encode all the points of y losslessly, with their individual Gaussian, into a bitstream, using entropy coding (we use an adaptive range coder. adaptive means the distribution can change at each point.)

Now, encoding is done! our bitstream contains z and y.

When we decode, we:

  1. first get z back from the bitstream, with entropy decoding, using the factorized prior.
  2. feed z through the hyper decoder, get again p(y|z)
  3. use that p(y|z) to decode y from the bitstream.
  4. feed y to G, get a reconstruction x'

Done!

So we use all the networks (except the discriminator).

CreatorZZY commented 4 years ago

Thanks for your patient anwser. hific is an Amazing Idea. Amazing hific, Amazing Google and ETHz!