Open Dada-Cloudzxy opened 2 weeks ago
hi @Dada-Cloudzxy
Thank you for discovering this issue.
run scripts create_service_qwen2.5_math_hf.sh to start service for eval. (NUM_LM_WORKER=2, NUM_RM_WORKER=2)
The hf model runner was adapted from Qwen-Math, it is still an inefficient version, so a quick solution is to use vllm API instead, can you use vLLM by running create_service_qwen2.5_math_vllm.sh
?
In the meanwhile, we will try to fix this bug.
Thank you very much, I will give it a try
Thank you very much, I will give it a try
Hi, I failed to reproduce your error following the instructions. Could you provide more information regarding the error message? Such as the error message in each worker session. Many thanks.
@YanSong97 Hi, I met the same bug, the only difference was model. Concretely, I finetuned Meta-Llama-3-8B using my prm/code/finetune_llama.py to get PRM (llama3_prm_checkpoint-6358) and use Meta-Llama-3-8B-Instruct for reason eval on MATH. The modified code to reproduce the bug I met is in my forked repo: Repo Link, Meta-Llama-3-8B and Meta-Llama-3-8B-Instruct were downloaded from huggingface, PRM (llama3_prm_checkpoint-6358) checkpoint and steps to reproduce it are in release of my forked repo: ckpt link.
Here are some extra detailed info may be useful:
2024-11-12 02:09:42 | ERROR | stderr | File "/xxx/openr/reason/llm_service/workers/inference.py", line 202, in generate_stream 2024-11-12 02:09:42 | ERROR | stderr | torch.log_softmax(logits[0, -1, :], dim=-1)[token].tolist() 2024-11-12 02:09:42 | ERROR | stderr | IndexError: index 4130146043630828155 is out of bounds for dimension 0 with size 128256
python==3.10.15 cuda==11.6 torch==2.4.0 The latest version of code GPU A6000_48G * 4
bash reason/llm_service/create_service_llama3_8b_instruct_hf.sh
in my forked repo to start service for eval (NUM_LM_WORKER=2, NUM_RM_WORKER=2, the bug still occurs when these two workers set to 1)bash scripts/eval/beam_search_MATH_llama3_8b_instruct.sh
in my forked repo to eval Math dataset
System Info
python==3.10.15 cuda==11.8-8.8.1 torch==2.4.0 The latest version of code GPU A100_40G * 8
Who can help?
@ziyuwan @Gebro13 @mengfn @gzqaq @YanSong97 @i
Information
Tasks
Reproduction
run cot_greey.sh for eval Math dataset
When I set NUM_LM_WORKER=1, NUM_RM_WORKER=1, it could work successfully. 91%|███████████████████████████████▊ | 454/500 [1:33:18<06:55, 9.04s/it]
But set NUM_LM_WORKER=2, NUM_RM_WORKER=2, it could fail. 0%|▏ | 1/500 [00:10<1:28:42, 10.67s/it] Traceback (most recent call last): ... json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I try to locate error, so I print some message near llm_service/worker/inference.py line:190
I got the wrong information in all four experiments with different prompts. (Out of Index Error)
2024-11-09 12:21:48 | INFO | stdout | prompt:<|im_start|>system 2024-11-09 12:21:48 | INFO | stdout | Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|> 2024-11-09 12:21:48 | INFO | stdout | <|im_start|>user 2024-11-09 12:21:48 | INFO | stdout | If $f(x) = \frac{3x-2}{x-2}$, what is the value of $f(-2) +f(-1)+f(0)$? Express your answer as a common fraction.<|im_end|> 2024-11-09 12:21:48 | INFO | stdout | <|im_start|>assistant 2024-11-09 12:21:48 | INFO | stdout | 2024-11-09 12:21:48 | INFO | stdout | tmp:tensor([nan, nan], device='cuda:0', dtype=torch.float16) 2024-11-09 12:21:48 | INFO | stdout | last_token_logits:tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', 2024-11-09 12:21:48 | INFO | stdout | dtype=torch.float16) torch.Size([151936]) 2024-11-09 12:21:48 | INFO | stdout | indices:tensor([9223231297218904063, 9223231297218904063], device='cuda:0') 2024-11-09 12:21:48 | INFO | stdout | tokens:[9223231297218904063, 9223231297218904063]
2024-11-09 12:23:17 | INFO | stdout | prompt:<|im_start|>system 2024-11-09 12:23:17 | INFO | stdout | Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|> 2024-11-09 12:23:17 | INFO | stdout | <|im_start|>user 2024-11-09 12:23:17 | INFO | stdout | If $f(x) = \frac{3x-2}{x-2}$, what is the value of $f(-2) +f(-1)+f(0)$? Express your answer as a common fraction.<|im_end|> 2024-11-09 12:23:17 | INFO | stdout | <|im_start|>assistant 2024-11-09 12:23:17 | INFO | stdout | 2024-11-09 12:23:17 | INFO | stdout | tmp:tensor([ 0.3206, -0.0068], device='cuda:0', dtype=torch.float16) 2024-11-09 12:23:17 | INFO | stdout | last_token_logits:tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', 2024-11-09 12:23:17 | INFO | stdout | dtype=torch.float16) torch.Size([151936]) 2024-11-09 12:23:17 | INFO | stdout | indices:tensor([571746046575616, 580542139599872], device='cuda:0') 2024-11-09 12:23:17 | INFO | stdout | tokens:[571746046575616, 580542139599872]
2024-11-09 12:25:45 | INFO | stdout | prompt:<|im_start|>system 2024-11-09 12:25:45 | INFO | stdout | Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|> 2024-11-09 12:25:45 | INFO | stdout | <|im_start|>user 2024-11-09 12:25:45 | INFO | stdout | What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?<|im_end|> 2024-11-09 12:25:45 | INFO | stdout | <|im_start|>assistant 2024-11-09 12:25:45 | INFO | stdout | 2024-11-09 12:25:45 | INFO | stdout | tmp:tensor([nan, nan], device='cuda:0', dtype=torch.float16) 2024-11-09 12:25:45 | INFO | stdout | last_token_logits:tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', 2024-11-09 12:25:45 | INFO | stdout | dtype=torch.float16) torch.Size([151936]) 2024-11-09 12:25:45 | INFO | stdout | indices:tensor([9223231297218904063, 9223231297218904063], device='cuda:0') 2024-11-09 12:25:45 | INFO | stdout | tokens:[9223231297218904063, 9223231297218904063]
2024-11-09 12:27:39 | INFO | stdout | prompt:<|im_start|>system 2024-11-09 12:27:39 | INFO | stdout | Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|> 2024-11-09 12:27:39 | INFO | stdout | <|im_start|>user 2024-11-09 12:27:39 | INFO | stdout | What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?<|im_end|> 2024-11-09 12:27:39 | INFO | stdout | <|im_start|>assistant 2024-11-09 12:27:39 | INFO | stdout | 2024-11-09 12:27:39 | INFO | stdout | tmp:tensor([nan, nan], device='cuda:0', dtype=torch.float16) 2024-11-09 12:27:39 | INFO | stdout | last_token_logits:tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', 2024-11-09 12:27:39 | INFO | stdout | dtype=torch.float16) torch.Size([151936]) 2024-11-09 12:27:39 | INFO | stdout | indices:tensor([9223231297218904063, 9223231297218904063], device='cuda:0') 2024-11-09 12:27:39 | INFO | stdout | tokens:[9223231297218904063, 9223231297218904063]
Expected behavior
The expectation is to be able to run scripts properly.