Open jeff62802217 opened 5 years ago
I don't have an answer, but it could be that the NN found a more optimal solution at epoch 100. I have worked a bit with the Mask R-CNN before and I found that not all the time the best solution is found as the training, to my understanding, is a bit random. Or maybe the authors changed a parameter value at that epoch to result in such a dramatical change.
EDIT:
To further support my second opinion (change of parameter value), we can take a look at this paper: http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w44/Hara_Learning_Spatio-Temporal_Features_ICCV_2017_paper.pdf in particular, Figure 3 (a). Here we see that value gets better at round numbers such as 100 and 150, indicating that the parameter value changes at certain intervals.
I carried out the experiment again but found the training accuracy converged to 27% and remained unchanged.