Closed krasserm closed 5 years ago
I just committed an update that fixes an improper handling of the maximum number of source and target positions. These can be set for training to ensure that the number of image feature vectors and/or caption tokens do not exceed a given maximum number. Default is 64 image feature vectors and 1024 caption tokens.
SimplisticEncoder
andCaptioningEncoder
use thesrc_lengths
parameter of theforward
method to compute anencoder_padding_mask
.FeatureDataset
still returns a fixed number of features per image (64) but I also tested with a temporary modification where a random number of features per image are selected (not part of this PR).Dataset
implementation that returns a variable number of features per image will be part of another PR.