question about the llama instruction code

philschmid / deep-learning-pytorch-huggingface

MIT License

580 stars 138 forks source link

question about the llama instruction code #28

Closed yeontaek closed 11 months ago

yeontaek commented 11 months ago

Hello. The qlora and flash attention code has been very helpful.

I noticed that the code worked fine in llam2 7b, but there seems to be a shape issue in 70b. Is that correct?

philschmid commented 11 months ago

Can you share the exact error you get? I didn't test with 70B, could be due to the GQA.

yeontaek commented 11 months ago

I've experimented with two models, namely the NousResearch and Meta's Llama 70b Model 2. However, I encountered a consistent error across both, as detailed below. Interestingly, I didn't experience any problems with the 7b model. I'm also considering testing other model sizes in the future to see if this issue persists.

  File "/data0/yeontaek-oh/task/qlora/utils/llama_patch.py", line 47, in forward
    key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[6, 796, 64, 128]' is invalid for input of size 4890624

philschmid commented 11 months ago

Does it work without flash attention? It must be due to GQA of the bigger models. We would need to make changes to the llama_patch

yeontaek commented 11 months ago

I confirmed that it works without any problems in the qlora environment without flash. Also, I verified that there are no issues up to 13b.

philschmid commented 11 months ago

Yeah its due to GQA, LAION already fixed that in that PR: https://github.com/LAION-AI/Open-Assistant/pull/3595/files Would be nice if you can take a look at what they did and open a PR.

yeontaek commented 11 months ago

Thank you. I'm currently testing the 13b model, and once that's done, I'll take a look and make adjustments.

jaslatendresse commented 11 months ago

Can you run this on a macbook pro m1 with 16 GB of RAM? I plan to use llama-2-7b-chat-hf...This is my last resort as nothing seems to work on my machine because nothing is compatible with M1.

philschmid commented 11 months ago

Not sure about M1, but i found this guide: https://gist.github.com/cedrickchee/e8d4cb0c4b1df6cc47ce8b18457ebde0