Closed Flaick closed 1 year ago
Hi Flaick,
We didn't explicitly use pos embedding for visual tokens. We used the default embedding from the VisualBert model. We are currently exploring the effects of pos embedding.
For default embedding, you can look through the (official huggingface VisualBert code, line 65-194.
Have a good day.
Hello, I am wondering if you guys use any 2D positional encoding to add to the visual feature tokens? If not, is there any reason why? Thanks!