Open EliverQ opened 1 year ago
By the way, I think the problem maybe the dtype I use (bf16). But the dtype in your config is fp16 and still doesn't work?
For the 3B model, since there's no official LLaMA 3B, we defined the model size ourselves and it might not agree with the 3B model sizes in other implementations
For the 3B model, since there's no official LLaMA 3B, we defined the model size ourselves and it might not agree with the 3B model sizes in other implementations
But I just use the hf code and checkpoint you released and don't modify anything.
Hmm, then that might be a bug on the HF side. We've tested it in HF transformers without the memory_efficient_attention and it works as expected.
Thank you very much! Perhaps I've been using the code incorrectly all along.
Hi, I feel confused about this bug when using memory_efficient_attention. It seems that the embed per head you choose can't match with xformers?
I'll appreciate it if you could help me.