Open liyongqi67 opened 1 year ago
Ah, sorry about that! The issue is from this line of the FSDP wrapping function. The MPT models are still missing some standard HF Transformers functions.
Would it work for your use case to comment out the aforementioned line from our codebase? The output embedding weight will then not be sharded. Alternatively, we can add a hack to get around this similar to this part of the code for MPT-1B.
//Thanks for your quick reply!
If I only comment out the corresponding line "self.lang_encoder.set_output_embeddings( wrap(wrap(self.lang_encoder.get_output_embeddings())) )", it will report another error:
File "/home/yongqi/miniconda3/envs/openflamingo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
loss_mmc4 = model(
File "/home/yongqi/miniconda3/envs/openflamingo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/share/yongqi/project/AutoregressiveImageRetrieval/code/open_flamingo/open_flamingo/src/flamingo.py", line 111, in forward
return forward_call(*args, **kwargs)
File "/home/share/yongqi/project/AutoregressiveImageRetrieval/code/open_flamingo/open_flamingo/src/flamingo.py", line 111, in forward
output = self.lang_encoder(
File "/home/yongqi/miniconda3/envs/openflamingo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
output = self.lang_encoder(
File "/home/yongqi/miniconda3/envs/openflamingo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/share/yongqi/project/AutoregressiveImageRetrieval/code/open_flamingo/open_flamingo/src/flamingo_lm.py", line 157, in forward
return forward_call(*args, **kwargs)
File "/home/share/yongqi/project/AutoregressiveImageRetrieval/code/open_flamingo/open_flamingo/src/flamingo_lm.py", line 157, in forward
return super().forward(**kwargs) # Call the other parent's forward method
return super().forward(**kwargs) # Call the other parent's forward method File "/home/yongqi/.cache/huggingface/modules/transformers_modules/anas-awadalla/mpt-1b-redpajama-200b/bfa38d4f431e091fe599d7b4cdb62972532f3c7c/mosaic_gpt.py", line 366, in forward
File "/home/yongqi/.cache/huggingface/modules/transformers_modules/anas-awadalla/mpt-1b-redpajama-200b/bfa38d4f431e091fe599d7b4cdb62972532f3c7c/mosaic_gpt.py", line 366, in forward
logits = F.linear(x, self.transformer.wte.weight, None)
RuntimeErrorRuntimeError: : size mismatch, got 15, 15x2048,51486720
It is very strange.
def get_input_embeddings(self):
return self.transformer.wte
def set_input_embeddings(self, value):
self.transformer.wte = value
def get_output_embeddings(self):
return self.transformer.wte
def set_output_embeddings(self, new_embeddings):
self.transformer.wte = new_embeddings
But it still reports the above mismatch error. Could you update how to set the get_output_embeddings() and set_output_embeddings()?
Have you solved the issue? I have the same problem when training with fsdp.
Thanks for this wonderful project. I used the following script to train the model.
However, if I set the fsdp flag, it will report an error as follows:
location: flamingo.py line 294.
If I remove this flag, there is no error. Do you have any idea about this?