hsiehjackson / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
599 stars 38 forks source link

OOM issue during evaluation #66

Open mengniwang95 opened 1 week ago

mengniwang95 commented 1 week ago

Thank you for your amazing work. I met OOM issue during trying to reproduce the result of llama3.1-8B.

Below is my change in configuration:

run.sh GPUS="1" # 2 also can't work ROOT_DIR="benchmark_root" # the path that stores generated task samples and model predictions. MODEL_DIR="meta-llama" # the path that contains individual model folders from HUggingface. ENGINE_DIR="." # the path that contains individual engine folders from TensorRT-LLM. BATCH_SIZE=1 # increase to improve GPU utilization

template.py

'llama3': "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{task_template}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",

config_model.sh llama3.1-8b) MODEL_PATH="${MODEL_DIR}/Meta-Llama-3.1-8B" MODEL_TEMPLATE_TYPE="llama3" MODEL_FRAMEWORK="hf" ;;

And below is my cmd: docker run --gpus '"device=1,2"' --net=host -v /data2:/workspace -it cphsieh/ruler:0.1.0 bash bash run.sh llama3.1-8b synthetic

The error is: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 62.54 GiB. GPU 1 has a total capacty of 79.26 GiB of which 23.11 GiB is free. Process 3987923 has 56.13 GiB memory in use. Of the allocated memory 49.56 GiB is allocated by PyTorch, and 6.08 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documen tation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hope you can take a look.

hsiehjackson commented 1 week ago

Have you tried using vLLM?

mengniwang95 commented 1 week ago

Hi @hsiehjackson , I tried vllm. But when I using vllm==0.4.0.post1, there would be this issue https://github.com/vllm-project/vllm/issues/6689, after upgrading vllm version, there would be other issues like: AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

So I wonder which env llama3 can work