Support variable number of image feature vectors

krasserm / fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq

Apache License 2.0

316 stars 56 forks source link

Support variable number of image feature vectors #1

Closed krasserm closed 5 years ago

krasserm commented 5 years ago

SimplisticEncoder and CaptioningEncoder use the src_lengths parameter of the forward method to compute an encoder_padding_mask.
FeatureDataset still returns a fixed number of features per image (64) but I also tested with a temporary modification where a random number of features per image are selected (not part of this PR).
A Dataset implementation that returns a variable number of features per image will be part of another PR.

krasserm commented 5 years ago

I just committed an update that fixes an improper handling of the maximum number of source and target positions. These can be set for training to ensure that the number of image feature vectors and/or caption tokens do not exceed a given maximum number. Default is 64 image feature vectors and 1024 caption tokens.