nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

Time taken on 8 A100? #7

Closed jzhang38 closed 5 months ago

jzhang38 commented 5 months ago

Hi,

Great work! I find this eval suite very handy. Just curious, how long would it take to perform the full evaluation for a 7B model with 8A100?

hsiehjackson commented 5 months ago

Hi @jzhang38, for mistralai/Mistral-7B-Instruct-v0.2 with vLLM, here is the summary of costing time per task.

4k: 4 mins
8k: 5 mins
16k: 10 mins
32k: 17 mins
64k: 40 mins
128k: 2 hrs
jzhang38 commented 5 months ago

Thanks. That is very helpfu!

Dongchenghang commented 1 month ago

Hi, Can you show your config for a 8B model (e.g. Llama-3-8B-Instruct) with vllm in 8 A100? I used the following config and cost almost 24 hrs for all tasks. run.sh:

GPUS="8" # GPU size for tensor_parallel.

config_models.sh:

        Llama-3-8B-Instruct)
            MODEL_PATH=$MY_PATH
            MODEL_TEMPLATE_TYPE="meta-chat"
            MODEL_FRAMEWORK="vllm"
            ;;

Also, if I changed GPUS=1, it just used cuda:0 and other GPUS not worked. What if running every single GPU for one single task in the meantime? For a 70B or 72B model (e.g. Qwen2-72B-Instruct), it cost almost 2 days for a single 128K task. It's quite time consuming! Can you give some advice for this? Thanks.