This package contains the accompanying code for the following paper:
Poster with follow-up works that include
[2] Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio Oracle performance for visual captioning. BRITISH MACHINE VISION CONFERENCE (BMVC) 2016 (oral).
[3] Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville Delving Deeper into Convolutional Networks for Learning Video Representations. International Conference of Learning Representations (ICLR) 2016. (conference track)
With the default setup in config.py
, you will be able to train a model on YouTube2Text, reproducing (in fact better than) the results corresponding to the 3rd row in Table 1 where a global temporal attention model is applied on features extracted by GoogLenet.
Note: due to the fact that video captioning research has gradually converged to using coco-caption as the standard toolbox for evaluation. We intergrate this into this package. In the paper, however, a different tokenization methods was used, and the results from this package is not strictly comparable with the one reported in the paper.
git clone git://github.com/Theano/Theano.git
to get the most recent version of Theano. $PYTHONPATH
.$PYTHONPATH
as well. youtube2text_iccv15
that contains 8 pkl
files. preprocessed YouTube2Text download link
common.py
and change the following two line RAB_DATASET_BASE_PATH = '/data/lisatmp3/yaoli/datasets/'
and RAB_EXP_PATH = '/data/lisatmp3/yaoli/exp/'
according to your specific setup. The first path is the parent dir path containing youtube2text_iccv15
dataset folder. The second path specifies where you would like to save all the experimental results.data_engine.py
by running python data_engine.py
without any error.python metrics.py
without any error.THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train_model.py
Running train_model.py
for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt
in the exp output folder. It is a big table saving all metrics per validation. Please refer to model_attention.py
line 1207 -- 1215 for actual meaning of columns.
In the paper, we never mentioned the use of uni-directional/bi-directional LSTMs to encode video representations. But this is an obvious extension. In fact, there has been some work related to it in several other recent papers following ours. So we provide codes for more sophicated encoders as well.
This is a known problem in COCO evaluation script (their code) where METEOR are computed by creating another subprocess, which does not get killed automatically. As METEOR is called more and more, it eats up mem gradually.
To fix the problem, add this line after line https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L44
self.meteor_p.kill()
Please refer to this repo.
If you have any questions, drop us email at li.yao@umontreal.ca.