Closed jzhang38 closed 5 months ago
Hi @jzhang38, for mistralai/Mistral-7B-Instruct-v0.2
with vLLM, here is the summary of costing time per task.
4k: 4 mins
8k: 5 mins
16k: 10 mins
32k: 17 mins
64k: 40 mins
128k: 2 hrs
Thanks. That is very helpfu!
Hi, Can you show your config for a 8B model (e.g. Llama-3-8B-Instruct) with vllm in 8 A100? I used the following config and cost almost 24 hrs for all tasks. run.sh:
GPUS="8" # GPU size for tensor_parallel.
config_models.sh:
Llama-3-8B-Instruct)
MODEL_PATH=$MY_PATH
MODEL_TEMPLATE_TYPE="meta-chat"
MODEL_FRAMEWORK="vllm"
;;
Also, if I changed GPUS=1
, it just used cuda:0
and other GPUS not worked. What if running every single GPU for one single task in the meantime?
For a 70B or 72B model (e.g. Qwen2-72B-Instruct), it cost almost 2 days for a single 128K task. It's quite time consuming!
Can you give some advice for this?
Thanks.
Hi,
Great work! I find this eval suite very handy. Just curious, how long would it take to perform the full evaluation for a 7B model with 8A100?