Closed Flowerfan closed 2 years ago
@Flowerfan oh yes, i do believe you are correct https://github.com/lucidrains/CoCa-pytorch/commit/4a6dbccb9b08d49229b378c4496c514f9a6ab427 i must have been thinking about Flamingo at the moment
thank you for pointing this out!
Hi, thanks for sharing this repo. In the CoCA paper, both the visual encoder and text encoder are end-to end trained. But in this repo, the vit is first pretrained then fixed to train CoCa.