Closed jongjyh closed 2 years ago
Hi,
Thanks for your interest.
The multimodal parts of CCLM follows our previous work, X-VLM. X-VLM is for multi-grained vision language pre-training, which was released in Nov 2021. (https://github.com/zengyan-97/X-VLM)
Hi,
Thank you for contributing a such great work! I found that the architecture is a little bit similar to Coca, is that because you were inspired by that or any other work? Can you provide some clues?