What does zero-pad means?

Zero-pad means the first 4 frames use the original temporal embedding, and then the remaining frames are given new temporal embeddings initialised to zero. See this function in the model.py file for the different methods. https://github.com/m-bain/frozen-in-time/blob/873c4967258eeabd88b6c0fc448e8882f95d0736/model/model.py#L115

One thing to note is that in the paper we found little effect on performance with different temporal inflation methods. This is because for most tasks the positional embeddings barely make a difference (in particular text-video matching). So unless your task really requires temporal positions, I wouldn't worry too much about the positional embeddings.

m-bain / frozen-in-time

What does zero-pad means? #43