This is a question about the method/paper, not so much the implementation.
During training, do you provide the corresponding camera pose (denoted as ξ in the paper) to the discriminator? It appears the answer is no. If this is the case, why doesn't the generator just ignore the camera pose altogether, and just learn to generate images from a random angle each time? In my mind, the discriminator wouldn't be able to tell? Perhaps you train on multiple samples with the same z per batch, enforcing that different ξ give reasonable results for the same z?
This is a question about the method/paper, not so much the implementation.
During training, do you provide the corresponding camera pose (denoted as ξ in the paper) to the discriminator? It appears the answer is no. If this is the case, why doesn't the generator just ignore the camera pose altogether, and just learn to generate images from a random angle each time? In my mind, the discriminator wouldn't be able to tell? Perhaps you train on multiple samples with the same z per batch, enforcing that different ξ give reasonable results for the same z?