Renderer Training Doubts

NiharikaVadlamudi commented 3 years ago

Hi, I have an issue with the way how the Neural Renderer is trained. Let's consider a generic ML/DL training procedure: We fix a train/validation set of fixed size and on the same train set we do backpropagation and then we evaluate on the final validation set. But here, we are randomly generating batchSize of 64, in both train and valid (after every 1000th iteration afair) parts and perform training for 5,00,000 epochs. I find this confusing, the randomly generated samples could vary drastically across the epochs, how are you ensuring model improvement? Are you simply trying to overfit the model to all possible combinations of co-ordinates in the canvas? I want to understand why you have taken this approach.

Thanks Niharika

hzwer commented 3 years ago

Hello! You are right, the renderer memorizes various combinations. But since the input is completely random, I am not sure if this can be called overfitting. I do not spend too much time on the training of the renderer, as long as it is reliable for random input results, it can be used to help our agent.

Recently there is a paper named Stylized Neural Painting. They study some details of neural renderer. I recommend you to read it.

NiharikaVadlamudi commented 3 years ago

Hello, Can I know the final MSE Loss Value for your neural renderer after 5,00,000 iterations? (I am getting around 48.4, for the final batch iteration, which is too high considering there are only 64 images ) .

I am new to RL , but I do know the theoretical basics of it. I have studied your codebase thoroughly, I have few doubts it would be great if you help me out on it as well.

Q1. What is env_batch here? [ I know that in RL, in general, you have an episode that has multiple steps, and we compute episodic rewards, etc . How does the concept of env_batch fit it here? (Is it related to action parameter (k=5)? ] Is it fine if I set env_batch = 1, and increase the per episode steps instead? (Is it related to the GAN part in any way)

Q2. What is the purpose of fastenv class? (I understand env.py & class Paint and class DDPG part very clearly, but not this )

Q3. Why is T being sent in the merged_state? ( I understand that its a counter, but sending it entirely to the Critic Network is confusing to me ) [ That channel will reach 1 (normalized max_value) will it not affect the results of the network? ]

Thank you !

hzwer commented 3 years ago

Hi, I don't remember the MSE results, you can download my training parameters and test them A1: env_batch is used to control the number of parallel processing canvases in the environment, it is independent of model training. The action parameter controls the number of strokes drawn by the actor in each step, and the image after each step is sent to GAN and critic. If GPU memory allows, you can increase action parameter.

A2: Fastenv is a bad name. It is actually used to turn a canvas into a stack of parallel execution canvases.

A3: In most DDPG, critical needs to know the number of steps left. I'm not sure if T is important, maybe it doesn't matter if it's removed completely.

NiharikaVadlamudi commented 3 years ago

Okay, that clears out doubts about the use of env_batch . But I would like to understand how this parallel processing works here. Could you explain the pipeline -- How are the images fed into each of these multiple envs ?

Eg : Let's say we have :

No of env_batchs = 2 Total Number of Images = 20 (Training Data) Batch_size = 5 No_of_episodes = 4 Max_steps_per_episode=10

Q1.How does the image data split into these 2 envs parallelly during training?

Q2. Also, could you explain when there env_batch =1 (Normal Case) ? {In my point of view, per episode, there will be 10 steps for every image in the batch_size,Pls correct me if I am wrong here}

Thank you!

hzwer commented 3 years ago

Training is offline, and it is almost separated from inference. A1: In your case, choose two from 20 training data, two blank canvases, infer in parallel and feed a large buffer. I don’t seem to set No_of_episodes. These two canvases will become blank after ten steps. A2: The training samples are sampled from the buffer. Env_batch is that several canvases are drawing different pictures, and the sampling of train_batch_size is among the samples collected recently.

NiharikaVadlamudi commented 3 years ago

Thanks for the reply. I didn't quite understand the above parts, due to some lack of fundamentals on my side.

I am trying to map Gym based environment to your Paint environment here, but it's not very clear to me.

[Training Part Only ]

Let's say we have a regular Gym Env, we have a set of episodes (N), where we play (m) steps till we reach 'done' (either by reaching max_steps or the agent loses), and let us assume after each episode we update_policy(). This part is clear to me, the relation between episodes, per episode steps, and policy_updates().

[Let's assume here there is no parallel /multiple canvas, i.e fastenv concept is not there ] In this Paint Env, I understand we have episodes in which we have multiple steps, but how do you feed images? What is the relation between the number of images, number of episodes, and number of per episode steps here? ( I am trying to understand what happens after we choose 1 image in the training batch? )

Thank You !

hzwer commented 3 years ago

Hi, I’m not very clear about your problem. I try to explain that. In common RL task, for example, we control a robot to run for 100 steps. In our task, we first select a target image for CelebA dataset(I think your said number of image is the size of dataset?), and each step the actor decide to draw 5 strokes on the canvas, and the env feedback the state. The number of step control the total step number of drawing this target image. We do not count the number of episode and we just care the number of training iterations.

NiharikaVadlamudi commented 3 years ago

Yep ! This was helpful and I cleared off my ResNet doubt too ! Thanks !

hzwer / ICCV2019-LearningToPaint

Renderer Training Doubts #42