OOM issue during evaluation

mengniwang95 commented 1 week ago

Thank you for your amazing work. I met OOM issue during trying to reproduce the result of llama3.1-8B.

Below is my change in configuration:

run.sh GPUS="1" # 2 also can't work ROOT_DIR="benchmark_root" # the path that stores generated task samples and model predictions. MODEL_DIR="meta-llama" # the path that contains individual model folders from HUggingface. ENGINE_DIR="." # the path that contains individual engine folders from TensorRT-LLM. BATCH_SIZE=1 # increase to improve GPU utilization

template.py

'llama3': "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{task_template}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",

config_model.sh llama3.1-8b) MODEL_PATH="${MODEL_DIR}/Meta-Llama-3.1-8B" MODEL_TEMPLATE_TYPE="llama3" MODEL_FRAMEWORK="hf" ;;

And below is my cmd: docker run --gpus '"device=1,2"' --net=host -v /data2:/workspace -it cphsieh/ruler:0.1.0 bash bash run.sh llama3.1-8b synthetic

The error is: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 62.54 GiB. GPU 1 has a total capacty of 79.26 GiB of which 23.11 GiB is free. Process 3987923 has 56.13 GiB memory in use. Of the allocated memory 49.56 GiB is allocated by PyTorch, and 6.08 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documen tation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hope you can take a look.

hsiehjackson commented 1 week ago

Have you tried using vLLM?

mengniwang95 commented 1 week ago

Hi @hsiehjackson , I tried vllm. But when I using vllm==0.4.0.post1, there would be this issue https://github.com/vllm-project/vllm/issues/6689, after upgrading vllm version, there would be other issues like: AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

So I wonder which env llama3 can work

hsiehjackson / RULER

OOM issue during evaluation #66