Regarding training your own model

@Aniketto16 Hi!

Thank you very much for your interest and for the kind words regarding our work! Regarding your question about finetuning your Elyza7B checkpoint for the VQA task, let's delve into some clarifications and recommendations.

Firstly, adopting a training scheme similar to LLaVA, where projection pretraining is followed by comprehensive LLM finetuning, could indeed be effective. However, we currently do not have a publicly available Japanese dataset for LLaVA pretraining. This means that directly mimicking the LLaVA training approach is not feasible at the moment (we are working on this, so please stay tuned for future updates).

From our experiments, we've found that finetuning both the projection and the LLM together, without separate projection pretraining, can also yield satisfactory results. This approach involves utilizing the full parameters for both components during finetuning, and we recommend giving it a try. See here.

Of course, pretraining on a Japanese Vision-Language dataset before proceeding to finetune the LLM on a specific VQA dataset is another strategy that is likely to be effective.

Regarding the distinction between "normal" and "instruct" datasets, the key difference lies in how the loss is calculated. For "normal" datasets, the loss is calculated across all input texts, whereas for "instruct" datasets, the loss is specifically calculated for the model's answers only. Instruction tuning typically utilizes "instruct" datasets, aiming to refine the model's ability to answer questions. We provide implementations for both types as a reference. While training with "normal" datasets can also be successful, it might lead to a model that tends to replicate human-like conversational patterns.

We hope this clarifies your queries and aids in your finetuning endeavors. Please feel free to reach out if you have further questions. We're excited to see the advancements you'll make with your project!

turingmotors / heron

Regarding training your own model #34