zengyan-97 / CCLM

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
BSD 3-Clause "New" or "Revised" License
87 stars 9 forks source link

For model architecture #4

Closed jongjyh closed 2 years ago

jongjyh commented 2 years ago

Hi,

Thank you for contributing a such great work! I found that the architecture is a little bit similar to Coca, is that because you were inspired by that or any other work? Can you provide some clues?

zengyan-97 commented 2 years ago

Hi,

Thanks for your interest.

The multimodal parts of CCLM follows our previous work, X-VLM. X-VLM is for multi-grained vision language pre-training, which was released in Nov 2021. (https://github.com/zengyan-97/X-VLM)