xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.
240 stars 10 forks source link

Pretraining with im_start_end token #18

Open Ali2500 opened 3 weeks ago

Ali2500 commented 3 weeks ago

I wanted to know why the prepare_input_labels_for_multimodal function in llava_arch.py is designed to throw an exception during pretraining if the mm_use_im_start_end option is enabled:

# TODO: image start / end is not implemented here to support pretraining.
if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False):
    raise NotImplementedError

Shouldn't the function work fine even if there are and tokens around the image tokens?