You can refer to the official implementation of Llava: https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task_lora.sh. However, when we were training, Llava-1.5 officially did not support similar functions, so we implemented it ourselves. Afterward, I confirmed through the git commit records that our modifications were comprehensive and correct.
3 epochs. However, there seemed to be a small issue in the Llava code in the early stages. According to the latest script, training for about one epoch should be enough.
您好! 有些训练的具体细节,看了两三遍论文,还是没有理解,还望可行的话,能够说的具体一些。 1,请问具体用了什么数据训练?是只用了ChartLlama dataset吗? 2,请问训练数据具体训练了哪个阶段?只是做了stage 2的visual instruction tuning ,基于lora的训练吗?还是stage 1 也做了训练? 3,请问训练epoch是多少?是一个,还是三个呢? 多谢! @tingxueronghua