Questions about global model training

rohanb2018 commented 2 years ago

Hello, I had two general questions about training the global model (using global_model.ipynb).

First, I noticed that the fine-tuning section of the notebook calls the inp_data_generator method, which doesn't seem to be defined in global_model.ipynb. In my code, I ended up switching to the DataGenerator that is actually defined in the notebook. Was there a particular reason for calling inp_data_generator in global_model.ipynb?

Second, I was curious what training hyperparameters were most useful for getting the global model to train successfully. I noticed in global_model.ipynb that the initial training phase runs for 2 epochs, followed by a fine-tuning phase that runs for 5 epochs. However, with these settings my final global model accuracy was only slightly above 0. Specifically, I was curious about the number of epochs as well as the training/validation set sizes that were most useful for successful training.

Happy to provide more details about my model performance if that helps. Thanks!

prajwaltr93 commented 2 years ago

hey,

i just noticed that i havent pushed latest changes to this repo. notebooks are missing few significant changes.

inp_data_generator was replaced with DataGenerator, and if i remember correctly i did not use the fine tune step. i was using fine tune step while i was still experimenting training with global model.
interesting, i cannot recall exact training epochs required, but as specified in README i was not able to train global model fully (mine is underfit), reason being i was using google colab and it had run time limitation. for test and validation i would suggest go with paper, they recommend using 90% of files for training and 10% for testing.

prajwaltr93 commented 2 years ago

just to add few points:

since i was not able to train my global model fully, i reached out to authors of paper regarding this information. they mentioned that training was in order of some days.
i have not implemented all types of augmentation, paper suggest 3 types where i have only implemented 1, so missing 2/3 of training data.
there is mistake in implemented architecture of global model in that notebook. on final layer instead of flattening it and connecting it to 10000 fully connected layer(which is a lot of weights) use this : Dense(1,input_shape=(100,100,64)). this is a neat trick to convert CNN's to FNN : https://cs231n.github.io/convolutional-networks/#convert

rohanb2018 commented 2 years ago

Thanks so much for the pointers! I will remove the fine-tuning, change the global model final layer, and double-check my train/val split and try to re-train the global model. Will let you know if I have any success with fully training the global model, or if I have any further questions.

rohanb2018 commented 2 years ago

Actually, I didn't fully understand the global model final layer change. After the Dense(1,input_shape=(100,100,64)) layer you suggested, wouldn't you still need a Dense(10000) at the end, because the final output of the network has to have 10000 units? (because the output is a distribution over all of the possible pen locations in the 100x100 grid)?

rohanb2018 commented 2 years ago

Just to show the current result from training the model, the training early stops after only 3 epochs (I guess because the validation loss is increasing). Here I reproduced the plot from the notebook (red = train loss, yellow = validation loss, green = train accuracy, blue = val accuracy). Anyway, I'm going to keep playing around with it (maybe increase the patience for the early-stopping).

prajwaltr93 commented 2 years ago

Actually, I didn't fully understand the global model final layer change. After the Dense(1,input_shape=(100,100,64)) layer you suggested, wouldn't you still need a Dense(10000) at the end, because the final output of the network has to have 10000 units? (because the output is a distribution over all of the possible pen locations in the 100x100 grid)?

Dense(1,input_shape=(100,100,64)) layer itself will result in 10000 output layer.

prajwaltr93 commented 2 years ago

Just to show the current result from training the model, the training early stops after only 3 epochs (I guess because the validation loss is increasing). Here I reproduced the plot from the notebook (red = train loss, yellow = validation loss, green = train accuracy, blue = val accuracy). Anyway, I'm going to keep playing around with it (maybe increase the patience for the early-stopping).

can you specify values of accuracies and losses ?

rohanb2018 commented 2 years ago

Unfortunately I don't have the exact accuracies and losses for that particular run anymore, but I think the val accuracy was around 0.0015.

I was actually able to get some considerably improved stats (train loss = 0.9388, train accuracy = 0.9348, val loss = 3.5167, val accuracy = 0.6756) after making a couple of changes, including increasing the number of train/validation samples (to 500000/35000), slightly increasing the early stopping patience to 5, and adding L2 regularization in the final two dense layers (using the original architecture, didn't have a chance to incorporate the Dense fix you suggested).

Updated plot:

exp_20220421_moredata

prajwaltr93 commented 2 years ago

i am surprised to how you are able to train without final dense layer modified, because that is a lot of parameters to train, you sure do have access to some serious hardware 😄 and also to add, you still can get a lot of training samples from 90% of files.

prajwaltr93 / teaching_robots_to_draw

Questions about global model training #1