I'm looking to reproduce some of the open-source model results from the VWA paper:
(1) Mixtral-8x7B model as the LLM backbone for Caption-augmented model
(2) CogVLM for the Multimodal Model.
Could someone share with me any flags/commands or instructions to setup these configurations for eval?
Hello,
I'm looking to reproduce some of the open-source model results from the VWA paper: (1) Mixtral-8x7B model as the LLM backbone for Caption-augmented model (2) CogVLM for the Multimodal Model.
Could someone share with me any flags/commands or instructions to setup these configurations for eval?