a bug to use 16 * A10(16 * 23g) to inference llama2-70b

nicknochnack / Llama2RAG

A working example of RAG using LLama 2 70b and Llama Index

352 stars 110 forks source link

a bug to use 16 * A10(16 * 23g) to inference llama2-70b #4

Open babytdream opened 1 year ago

babytdream commented 1 year ago

I have 16 gpus in one machine.

Hello!when I use 16 A10(16 23g) to inference llama2-70b, it appears error:

I ask many people to solve this problem,but failed. I know 8 gpu can work it! But I need to increase the prompt of llama2, the 8 GPU is not enough! Do you have some ideas, thanks!