Closed XiaoHao-Chen closed 5 years ago
Hi @331801070049,
it seem your question is pretty general (and not specific to pixel-cnn). Every conditional probability is estimated by the network. The network is then trained to change its parameters such that the seen pixels are likely under the probability distributions emitted by the network.
If you have more specific questions I will be happy to help.
-Lucas
您好@ 331801070049,
看起来你的问题很普遍(而不是像pixel-cnn那样具体)。每个条件概率都由网络估算。然后训练网络以改变其参数,使得所看到的像素可能在网络发射的概率分布之下。
如果您有更具体的问题,我将很乐意为您提供帮助。
-Lucas
Thank you for your reply. I may not have a good interpretation of my question, but my main question now is:
In addition, if I want to train my own data set, how should I set the proportion of training and testing? The size of the data I want to train is 256*256. Will this be affected? Thank you very much for your help! Best wishes for you!
As for your last comment, 256x256 images are much bigger than 32 x 32, so you should expect training to be very slow at this resolution. What is your use case ?
Thank you very much for your reply, which is very helpful to me.
Did you say that the predicted pixels are only in the testing process? And when predicting (i, j), you say it can access the pixels of the real image. Shouldn't it be a predicted pixel before accessing it? I haven't been able to figure out how to generate an image. All I know now is that it generates pixels by pixels. Can you give me more details about the process of generating images?
To answer the first part of your question, the model is "teacher forced' during training. During sampling however, pixels are generated sequentially. The code for sampling can be found here.
Say our image is a 3 x 3 grid, and thus contains 9 pixels. Your goal is to predict the joint distribution, i.e. p(x1, x2, ..., x9). Autoregressive models, such as pixel cnn, break up the joint distribution as a product of conditions (as in done in NLP models like LSTMs or Transformers): p(x1, x2, .., x9) = p(x1) p(x2 | x1) p(x3 | x2,x1) ... p(x9 | x8,x7,...x1).
It's important to ensure that every conditional distribution e.g. p(x3|x2,x1), cannot look at its target value == x3. In order to achieve this, the model uses masked convolutions (you can look here for more info)
Now, when you are training, you already have x1, ... , x9, so you can compute all the conditionals at the same time. During sampling however, since x2 depends on x1, and x3 depends on x2, you need to sample every pixel at a time (this cannot be parallelized). This iteration over pixels corresponds to the for loop in the sample
method I linked above.
Hope this helps, Lucas
Hi Lucas, could you please point where in the code you are enforcing teacher during training?
when i train the pixelcnn model, it can output a image like similar to the input, but when I try to generate a new image, most of the pixel values are same. so what's wrong with my train or generate process
Hello, I want to ask you a question about the training and testing of pixelcnn. In the training process, a batch of images are sent in, and the probability density of the pixels is estimated by the network. Then what? How does it generate new images through these probability densities? I haven't been able to understand the specific training process on this point.I didn't find out how it was trained in paper. If you can, please let me know. Thanks very much.