Closed ZoneFv closed 8 months ago
All of the information about the data we used are included in the paper, including datasets ratio.
Our final models are trained with 200k*256 (=51M) and 10k*128 (=1.3M) examples in pre-training and instruction tuning, respectively.
Hi, since you are using a multi-round dialog format, does 130M represent the number of images used, or the total number of dialogs?
Each multi-round dialog has a single image. Hence, 1.3M (not 130M) is the number of images and dialogues (and the number of rounds > 1.3M).
use a single node with A100 80GB × 8,with the current amount of training data, how much time did the two stages take?
It depends on the model configuration. For Honeybee-C-7B-M144
with ablation setting (50k pre-training and 4k instruction tuning), the total training time (including intermediate evaluations) is about 30.7H and 5.7H for pre-training and instruction tuning, respectively.
What is the amount of training data used in the stage1 and stage2? Can you open source the data information used for training?