microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

Trained features? #41

Open ObeidaElJundi opened 4 years ago

ObeidaElJundi commented 4 years ago

For image captioning on COCO, I am trying to obtain an image features from a trained model instead of generating the caption. In DOWNLOAD.md, under Datasets, are the image region features (e.g., train.feature.tsv) extracted before or after training the model on downstream tasks (e.g., image captioning on COCO)? If before, how can I obtain an image features from a trained model? One more question: in MODEL_ZOO.md, under Image Captioning on COCO, is the Model checkpoint: checkpoint.zip trained and finetuned? or we still need to train with cross-entropy loss and finetune with CIDEr optimization?

EByrdS commented 3 years ago

I think that the Model checkpoint: checkpoint.zip that you mention is only the output of cross-validation. I assume this from reading the log.txt file in the same section. Reading from it, it looks like the command line used was exactly the one portrayed under

  1. First train with cross-entropy loss:

I would think that we would need to do

  1. Finetune with CIDEr optimization:

But I would like to have some confirmation as well, as doing that finetuning might take too long for me.