mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
257 stars 26 forks source link

"ValueError: Attempting to unscale FP16 gradients" in vicuna_v1.1 #9

Open Zhudongsheng75 opened 11 months ago

Zhudongsheng75 commented 11 months ago

I have a question I would like to share with the authors. I would be very grateful if you could reply.

As far as I understand, your work follows instructblip. However, in the original paper of instructblip, the LLM weight they used is vicuna_v1.1 instead of v0.1 here. Why did you choose different LLM weights?

In fact, I tried vicunav1.1 for training, but I encountered the error mentioned in the title, "ValueError: Attempting to unscale FP16 gradients". Through positioning, I found that the main problem may be caused by the following code in BLIVA/bliva/models/blip2_vicuna_instruct.py_:

self.llm_tokenizer.add_special_tokens({'pad_token': '[PAD]'})
self.llm_tokenizer.add_special_tokens({'bos_token': '</s>'})
self.llm_tokenizer.add_special_tokens({'eos_token': '</s>'})
self.llm_tokenizer.add_special_tokens({'unk_token': '</s>'})

self.llm_model.resize_token_embeddings(len(self.llm_tokenizer))

Did you encounter similar problems and therefore replaced v1.1 with v0.1?

gordonhu608 commented 11 months ago

Thank you for your interest in our work. Could you please also try training with version 0.1 with the same setting to verify this is the problem? v1.1 and v0.1 are only different in tokenization and separator.

Zhudongsheng75 commented 11 months ago

Thank you for your reply. Does the difference between v0.1 and v1.1 only exist in tokenization and separator? I tried generating with v0.1 and v1.1 respectively and got completely different results. Using a mismatched vicuna version for generation will result in confusing generation results.