Closed bubbliiiing closed 2 years ago
In main_pretrain.py, line 165: p = (1+_h*_w)*i_t + i_h*_w + i_w I think it should plus 1. p = (1+_h*_w)*i_t + i_h*_w + i_w + 1 Because your first position is for the separator.
p = (1+_h*_w)*i_t + i_h*_w + i_w
p = (1+_h*_w)*i_t + i_h*_w + i_w + 1
Already stared your code, thanks for you contribution.
Yep, thanks for pointing it out! We have a [CLS_V] token in front of each video frame, so it should +1.
OK, thank you
In main_pretrain.py, line 165:
p = (1+_h*_w)*i_t + i_h*_w + i_w
I think it should plus 1.p = (1+_h*_w)*i_t + i_h*_w + i_w + 1
Because your first position is for the separator.