google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow
https://google.github.io/seq2seq/
Apache License 2.0
5.61k stars 1.3k forks source link

Anybody succeeded to run the image captioning task? #215

Open Jiakui opened 7 years ago

Jiakui commented 7 years ago

I am just wondering is it possible to use the current code to run the image captioning task. If so, can anybody give me some hint?

Thanks!

kyleyeung commented 7 years ago

@Jiakui The author has already released an experimental model here: https://github.com/google/seq2seq/blob/master/seq2seq/models/image2seq.py. It works, but needs a few modification.

I'm afraid there could be some problems with it. The training stopped after several thousands of steps, gonna work on it later.

syed-ahmed commented 7 years ago

Hi @Jiakui , I got it working with the following config. You would just have to create this yaml config file and use it in the train script:

model: Image2Seq
model_params:
  attention.class: AttentionLayerBahdanau
  attention.params:
    num_units: 512
  bridge.class: InitialStateBridge
  embedding.dim: 128
  encoder.class: InceptionV3Encoder
  decoder.class: AttentionDecoder
  decoder.params:
    rnn_cell:
      cell_class: LSTMCell
      cell_params:
        num_units: 512
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  optimizer.name: Adam
  optimizer.params:
    epsilon: 0.0000008
  optimizer.learning_rate: 0.0001
  target.max_seq_len: 50

You would also have to consider the data input pipeline and see how the TFRecords are being read and set the features keys accordingly (https://github.com/google/seq2seq/blob/master/seq2seq/data/input_pipeline.py#L281)