About An Ensemble Model

Hi, thank you for releasing a code for the paper.

I have a question on the implementation. It is written in the paper that you have obtained the best performance on something-something from an ensemble of networks with {16, 20, 24, 32} number of frames.

I wonder how was this ensemble implemented? Did you train one single model (e.g. taking 16 frames as an input) and test by making that model to take various number of frames {16, 20, 24, 32} (it could be possible because the model performs global average pooling at the end of 3DConvNet so temporal dimension goes away), or train multiple models with different number of frames (\e.g. one model takes 16 frames on both train/test, another model takes 20 frames on both train/test, .. , the other takes 32 frames on both train/test)?

Thank you

mzolfaghari / ECO-efficient-video-understanding

About An Ensemble Model #28