layumi / Image-Text-Embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
MIT License
287 stars 73 forks source link

Problem training on MSCOCO #9

Open lascavana opened 5 years ago

lascavana commented 5 years ago

I am trying to train the model on MSCOCO and run into the following issues:

1- When running 'train_coco_word2_1_pool.m' as you suggest, I get the error that the function 'coco_word2_pool_no_w2v' does not exist.

2- I therefore changed it for 'coco_word2_pool' since this function is indeed in the directory (is this what you meant?). Then I get the following error:

_Error using reshape To RESHAPE the number of elements must not change.

Error in coco_word2_pool (line 273) net.params(first).value = reshape(single(subset.features'),1,1,29972,300);

Error in train_coco_word2_1_pool (line 17) net = coco_word2pool();

3- I experience the same issue when running 'train_coco_word2_1_pool_vgg19.m'

4- The reason that I am training is that I want to reproduce your results but the 20 epochs in your pretrained model don't seem to be enough. Are these the parameters that you used to report test results?

I am running the code on a MacBook Pro, on Matlab R2018b. Thank you in advance.

layumi commented 5 years ago
  1. Yes. You are right. Sorry. I forget to change the name. It should be 'coco_word2_pool'

2&3. Can you check you prepare code? The shape of your subset.features may be different from me.

  1. Coco is a very very large dataset. In fact, 20 Epochs have already contained many iterations.
lascavana commented 5 years ago

Thank you for your fast response.

Indeed, for some reason my subset.features has a different size. I don't see why though. You set rng(1) so the randomization should not be a problem. For a sanity check, can you tell me what you obtain if you do:

load('caption_train.mat') size(caption_dic,2)

In any case, I changed the dictionary size (d) in

(second line is the size of the first convolutional layer in the text cnn, so that it matches the new dictionary). Still there seems to be a mismatch. I get the following error:

The FILTERS depth does not divide the DATA depth.

Error in dagnn.Conv/forward (line 12) outputs{1} = vl_nnconv(...

Error in dagnn.Layer/forwardAdvanced (line 85) outputs = obj.forward(inputs, {net.params(par).value}) ;

Error in dagnn.DagNN/eval (line 91) obj.layers(l).block.forwardAdvanced(obj.layers(l)) ;

Error in cnn_train_dag>processEpoch (line 222) net.eval(inputs, params.derOutputs, 'holdOn', s < params.numSubBatches) ;

Error in cnn_train_dag (line 90) [net, state] = processEpoch(net, state, params, 'train',opts) ;

Error in train_coco_word2_1_pool (line 42) [net,info] = cnn_train_dag(net, imdb, @getBatch,opts) ;

Error in run (line 91) evalin('caller', strcat(script, ';'));

I am still working on finding this particular filter with mismatched size, but if you have any clue where it could be I'd highly appreciate the hint.

Thank you very much for your time!

lascavana commented 5 years ago

Solved. But I still wonder why the dictionaries are different. You should maybe share your own dictionary together with the pre-trained model, because otherwise people will not be able to use them.

layumi commented 5 years ago

Thank you @lascavana . Sure.

This is the link for the dictionary files. https://drive.google.com/open?id=1Yp6B5GKhgQTD9bsmvmVkvxt-SnmHHjVA