Open Kyubyong opened 4 years ago
Oh, never mind. Now I see there is an option for InceptionNet. Then were the pretrained models (checkpoint 20, 24) trained using the --featuregrid' option? Or the
obj` option?
They were trained as described in https://github.com/krasserm/fairseq-image-captioning/blob/master/README.md#training
Thanks for your confirmation. Have you checked the performance of the pretrained model provided in https://github.com/krasserm/fairseq-image-captioning/tree/wip-train-inception? I'm curious how good the grid based model is compared to the obj. based model.
The object-based model is significantly better but when I trained the grid-based model long time ago I didn't really tune hyper-parameters. So it may be worth re-training it with hyper-parameters similar to object-based training (lr, warmup, ...), at least as starting point. On the other hand, most image captioning papers report object-based approaches to be superior to grid-based approaches.
Hi, I wonder if we can use the extracted features from the resnet-152 model, not from the Faster-RCNN because the former is easy to implement.