I've recently encountered this problem. While training neuraltalk2 on Ms COCO, a checkpoint is saved every 2500 iters, with language evaluation enabled. Everything was going smoothly and the scores were calculated using coco-caption. At 232500 iter, my CIDEr score achieved was 0.864 over 3200 val samples and as it was the best score achieved when model checkpoint.t7 was saved, along with the json file showing all progress. So, I paused the training process and ran eval on the very same validation set (3200 samples), using the checkpoint created, for 2 different configurations. The first (opt.sample_max=1, opt.beam_size=1, opt.temperature=1.0), which is by default used by train.lua achieves 0.817 CIDEr and the second (opt.sample_max=1, opt.beam_size=2, opt.temperature=1.0) achieves 0.842 CIDEr.
This seemed strange to me as, at my last saved checkpoint.json at 232500 iteration, CIDEr reads 0.864 (by default beam_size=1). Loading the checkpoint.t7 and running eval.lua, all the metrics (not just CIDEr) are lower than what they should be. I tried to resume training, but when loading from checkpoint.t7 and re-evaluating scores at startup, CIDEr is calculated at 0.817 (not what was expected) and captions generated are not the same as in checkpoint.json. Why doesn't it resume were it left off?
I figured, that it might be a coco-caption bug so I ran coco-caption using the val_predictions in the last checkpoint.json saved and CIDEr was calculated at 0.864, so no coco-caption error there.
Could this mean that the language model weights are not stored properly in checkpoint.t7? I've tried retraining with different configurations a bunch of times, but this still persists.
Hello everyone,
I've recently encountered this problem. While training neuraltalk2 on Ms COCO, a checkpoint is saved every 2500 iters, with language evaluation enabled. Everything was going smoothly and the scores were calculated using coco-caption. At 232500 iter, my CIDEr score achieved was 0.864 over 3200 val samples and as it was the best score achieved when model checkpoint.t7 was saved, along with the json file showing all progress. So, I paused the training process and ran eval on the very same validation set (3200 samples), using the checkpoint created, for 2 different configurations. The first (opt.sample_max=1, opt.beam_size=1, opt.temperature=1.0), which is by default used by train.lua achieves 0.817 CIDEr and the second (opt.sample_max=1, opt.beam_size=2, opt.temperature=1.0) achieves 0.842 CIDEr.
This seemed strange to me as, at my last saved checkpoint.json at 232500 iteration, CIDEr reads 0.864 (by default beam_size=1). Loading the checkpoint.t7 and running eval.lua, all the metrics (not just CIDEr) are lower than what they should be. I tried to resume training, but when loading from checkpoint.t7 and re-evaluating scores at startup, CIDEr is calculated at 0.817 (not what was expected) and captions generated are not the same as in checkpoint.json. Why doesn't it resume were it left off?
I figured, that it might be a coco-caption bug so I ran coco-caption using the val_predictions in the last checkpoint.json saved and CIDEr was calculated at 0.864, so no coco-caption error there.
Could this mean that the language model weights are not stored properly in checkpoint.t7? I've tried retraining with different configurations a bunch of times, but this still persists.
Any help would be much appreciated. Thank you!