mynlp / cst_captioning

PyTorch Implementation of Consensus-based Sequence Training for Video Captioning
60 stars 17 forks source link

Mean pooling for ResNet features? #11

Closed mehrdad-h closed 6 years ago

mehrdad-h commented 6 years ago

Hi, I was wondering if you have used mean pooling to blend ResNet features for every frame into a 2048-D vector (representing the ResNet features for that video chunk)? If not, can you describe how did you merged features across the frames for each clip?

plsang commented 6 years ago

yes, I used mean pooling.