prajwaltr93 / teaching_robots_to_draw

an attempt at implementing deep learning model proposed in paper teaching robots to draw
10 stars 2 forks source link

Questions about global model training #1

Open rohanb2018 opened 2 years ago

rohanb2018 commented 2 years ago

Hello, I had two general questions about training the global model (using global_model.ipynb).

First, I noticed that the fine-tuning section of the notebook calls the inp_data_generator method, which doesn't seem to be defined in global_model.ipynb. In my code, I ended up switching to the DataGenerator that is actually defined in the notebook. Was there a particular reason for calling inp_data_generator in global_model.ipynb?

Second, I was curious what training hyperparameters were most useful for getting the global model to train successfully. I noticed in global_model.ipynb that the initial training phase runs for 2 epochs, followed by a fine-tuning phase that runs for 5 epochs. However, with these settings my final global model accuracy was only slightly above 0. Specifically, I was curious about the number of epochs as well as the training/validation set sizes that were most useful for successful training.

Happy to provide more details about my model performance if that helps. Thanks!

prajwaltr93 commented 2 years ago

hey,

i just noticed that i havent pushed latest changes to this repo. notebooks are missing few significant changes.

  1. inp_data_generator was replaced with DataGenerator, and if i remember correctly i did not use the fine tune step. i was using fine tune step while i was still experimenting training with global model.
  2. interesting, i cannot recall exact training epochs required, but as specified in README i was not able to train global model fully (mine is underfit), reason being i was using google colab and it had run time limitation. for test and validation i would suggest go with paper, they recommend using 90% of files for training and 10% for testing.
prajwaltr93 commented 2 years ago

just to add few points:

  1. since i was not able to train my global model fully, i reached out to authors of paper regarding this information. they mentioned that training was in order of some days.
  2. i have not implemented all types of augmentation, paper suggest 3 types where i have only implemented 1, so missing 2/3 of training data.
  3. there is mistake in implemented architecture of global model in that notebook. on final layer instead of flattening it and connecting it to 10000 fully connected layer(which is a lot of weights) use this : Dense(1,input_shape=(100,100,64)). this is a neat trick to convert CNN's to FNN : https://cs231n.github.io/convolutional-networks/#convert
rohanb2018 commented 2 years ago

Thanks so much for the pointers! I will remove the fine-tuning, change the global model final layer, and double-check my train/val split and try to re-train the global model. Will let you know if I have any success with fully training the global model, or if I have any further questions.

rohanb2018 commented 2 years ago

Actually, I didn't fully understand the global model final layer change. After the Dense(1,input_shape=(100,100,64)) layer you suggested, wouldn't you still need a Dense(10000) at the end, because the final output of the network has to have 10000 units? (because the output is a distribution over all of the possible pen locations in the 100x100 grid)?

rohanb2018 commented 2 years ago

Just to show the current result from training the model, the training early stops after only 3 epochs (I guess because the validation loss is increasing). Here I reproduced the plot from the notebook (red = train loss, yellow = validation loss, green = train accuracy, blue = val accuracy). Anyway, I'm going to keep playing around with it (maybe increase the patience for the early-stopping).

image

prajwaltr93 commented 2 years ago

Actually, I didn't fully understand the global model final layer change. After the Dense(1,input_shape=(100,100,64)) layer you suggested, wouldn't you still need a Dense(10000) at the end, because the final output of the network has to have 10000 units? (because the output is a distribution over all of the possible pen locations in the 100x100 grid)?

Dense(1,input_shape=(100,100,64)) layer itself will result in 10000 output layer.

prajwaltr93 commented 2 years ago

Just to show the current result from training the model, the training early stops after only 3 epochs (I guess because the validation loss is increasing). Here I reproduced the plot from the notebook (red = train loss, yellow = validation loss, green = train accuracy, blue = val accuracy). Anyway, I'm going to keep playing around with it (maybe increase the patience for the early-stopping).

image

can you specify values of accuracies and losses ?

rohanb2018 commented 2 years ago

Unfortunately I don't have the exact accuracies and losses for that particular run anymore, but I think the val accuracy was around 0.0015.

I was actually able to get some considerably improved stats (train loss = 0.9388, train accuracy = 0.9348, val loss = 3.5167, val accuracy = 0.6756) after making a couple of changes, including increasing the number of train/validation samples (to 500000/35000), slightly increasing the early stopping patience to 5, and adding L2 regularization in the final two dense layers (using the original architecture, didn't have a chance to incorporate the Dense fix you suggested).

Updated plot:

exp_20220421_moredata

prajwaltr93 commented 2 years ago

i am surprised to how you are able to train without final dense layer modified, because that is a lot of parameters to train, you sure do have access to some serious hardware 😄 and also to add, you still can get a lot of training samples from 90% of files.