Open StarCycle opened 2 months ago
Hi @StarCycle, thanks for the feature request. Multimodal support is something we are still exploring. Would love to learn more about what you would like to use it for. And of course we welcome any initial prototype, if you're interested in contributing this :)
Hi @RdoubleA,
Currently I am training llava with Xtuner, which is similar to torchtune. They support finetuning, evaluation and deployment of llava models (we can easily add custom modification to the models). Integration of LLaVA 1.6 and video input is on the way. You can take their implementation as a reference :)
But they rely on HuggingFace transformers...I guess torchtune has less independency, which will be quite good!
Hello,
Would you like to support mllm like llava?