The model is a LLaMA style architecture with a VQGAN for image input and generation. It is also likely to be finetuned for patch input for images similar to Fuyu, so it would be a good idea to have the implementation flexible for different types of image input. The weights are available under a research license.
Model description
The model is a LLaMA style architecture with a VQGAN for image input and generation. It is also likely to be finetuned for patch input for images similar to Fuyu, so it would be a good idea to have the implementation flexible for different types of image input. The weights are available under a research license.
Open source status
Provide useful links for the implementation
https://github.com/facebookresearch/chameleon
https://ai.meta.com/blog/meta-fair-research-new-releases/