teamtma / Image_Captioning

MIT License
0 stars 0 forks source link

Prevent the image captioning code from shuffling images. #2

Closed tohiddar closed 2 years ago

tohiddar commented 2 years ago

Since the code shuffles the image database each time it learns, it makes it hard to do a proper parametric study. Find out how to prevent the code from shuffling images when it picks them from the database.

tohiddar commented 2 years ago

There appears to be a few places in the code that uses a random function to shuffle. One of the shuffle areas is the in the split_data_training_testing function in the data_prep.py module where it shuffles the image keys.

Create training and validation sets using an 80-20 split randomly.

img_keys = list(img_to_cap_vector.keys())
random.shuffle(img_keys)

The code also uses a random image to compare its "Real Caption" to the predicted caption. This happens in the validation_set_captions function under the eval.py module.

captions on the validation set

rid = np.random.randint(0, len(img_name_val))

The code also uses a random function to calculate the predicted_id's for the categories of the caption words. I am not sure yet what this random function does here but it appears to pick the categories of the words that are used to create the caption (like is the word a noun, a verb, etc.) predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy() If one prints the predicted_id's, they are integers that seem to correspond to various word categories.

Note: One thing to note is that the checkpoint files which are located here (checkpoints/train/) should be removed each time to avoid causing the a checkpointed solution to affect the next trial.

Summary: the two first random variables above, relate to and influence the training of the model. So my expectation was that eliminating those randomness's would eliminate randomness from the training. However, despite deactivating those, training the model resulted in a slightly different loss function each time. This likely could be attributed to the fact that the adams optimizer is a stochastic optimizer which means it is going to yield a different result each time. Therefore, any effort in eliminating the randomness of the model is not very useful. Instead the focus should be put on creating metrics to help us evaluate the quality of the captions produced.

tohiddar commented 2 years ago

Closing this issue. Similar work will be tracked in other ITS tickets where we will try to define and test the evaluation metrics.