zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Apache License 2.0
162 stars 21 forks source link

Have you tested the llava-interleave-qwen-7b model on the ego4d_video_train.json dataset? #12

Closed Xavier-Zeng closed 3 months ago

Xavier-Zeng commented 3 months ago

I test, but get some errors like this: image

zjysteven commented 3 months ago

On my end it runs with no problem. Take a look at #8. If you are using local checkpoint, make sure you have chat_template.json in your checkpoint folder (or the tokenizer_config.json has "chat_template" attribute). Also try upgrading the version of your transformers library.

Xavier-Zeng commented 3 months ago

On my end it runs with no problem. Take a look at #8. If you are using local checkpoint, make sure you have chat_template.json in your checkpoint folder (or the tokenizer_config.json has "chat_template" attribute). Also try upgrading the version of your transformers library.

thanks,I try this, it also seems work.

chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n'}}{# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>' }}{% endfor %}{# Render all text next #}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ '\n' + content['text'] }}{% endfor %}{{'<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

cur_text = self.processor.apply_chat_template(cur_text, tokenize=False, add_generation_prompt=False, chat_template=chat_template)
zjysteven commented 3 months ago

Yes this way it can work too. Closing now. Feel free to reopen if there are any other questions.