Gemma OOM - Githubissues

magpie-align / magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

https://magpie-align.github.io/

MIT License

367 stars 36 forks source link

Gemma OOM #9

Closed choco9966 closed 1 month ago

choco9966 commented 2 months ago

Hello,

I'm encountering an Out Of Memory (OOM) error while trying to proceed with Gemma using the parameters below. I'm working in an environment with 8 A100 GPUs (80GB each).

model_path=${1:-"google/gemma-2-9b-it"}
total_prompts=${2:-1000}
ins_topp=${3:-1}
ins_temp=${4:-1}
res_topp=${5:-1}
res_temp=${6:-0}
res_rep=1
device="0,1,2,3,4,5,6,7"
tensor_parallel=8
gpu_memory_utilization=0.95
n=200
batch_size=200

Thank you.

fly-dust commented 2 months ago

Hi, Thanks for the message. To run Gemma-2, you need to first upgrade to the latest vllm. Since vllm does not have the official release for Gemma-2, you may need to build vllm from source.

If, unfortunately, after installing the latest vllm, it is still not working, you may need to decrease the gpu_memory_utilization. For example, you can try gpu_memory_utilization=0.8.

Also, can you please check if your GPUs are not running other scripts?

Please let me know whether this helps! 🥲

fly-dust commented 1 month ago

I also tried Gemma 2 using vllm==0.5.1. The results are different from those generated using HF transformers. Maybe we need to wait for a stable vllm version.

choco9966 commented 1 month ago

Thank you for your response. I was using vllm 0.5.1 and encountered a bug with gpu_memory_utilization=0.95 and a tensor size of 8. I also thought it was a Vllm bug and have been waiting, but some people seem to inference with gemma-2-it already, so I'm really not sure.

fly-dust commented 1 month ago

Yes... I think so. I will also try from my side. If they fix that, I will ping this issue.

fly-dust commented 1 month ago

Hi, Since the vllm engine is still problematic, I added the HF engine for the Gemma2 series. I also added a series of rules to sanitize instructions for Gemma2. However, since it is HF, the speed would be much slower than vllm...

If you encounter any issues using new scripts, please let me know~

choco9966 commented 1 month ago

Oh, thank you! I will try it right away.

fly-dust commented 1 month ago

Hi! We extracted a bit from Gemma 2 27B here. The sanitized version is here. We are now scaling it up!