Your code does not support multi-GPU training, especially at the clip_transformer.py#L37 position. Even with the use of gather operations in multi-GPU, it is not possible to achieve video_features_pooled.
You are correct that this operation as it stands in the code cannot be done on multi-GPU, but you can certainly still apply CLIP or other operations in the code to a multi-GPU setting. Thanks!
Your code does not support multi-GPU training, especially at the clip_transformer.py#L37 position. Even with the use of gather operations in multi-GPU, it is not possible to achieve video_features_pooled.