Closed blackblue9 closed 1 month ago
What would happen when you directly send all prompts to the generate
funcion, instead of calling it 1000 times?
Do you mean to set the parameter n in SamplingParams to a larger value, such as n=2000? I have tried setting n to a larger value, such as 400, but the model has no output for a long time and the GPU utilization is always 0
llm = LLM(model="/mnt/model/qwen2_72B_chat/", tensor_parallel_size=8)
prompts = ["<|im_start|>user\n" for _ in range(100)]
pre_query_template="<|im_start|>user\n"
outputs = llm.generate(prompts, sampling_params)
It has been running for more than 10 minutes and seems to be OK. Thank you for your reply.
10 minutes is not enough. I meet the same problem after 2 hours running.
llm = LLM(model="/mnt/model/qwen2_72B_chat/", tensor_parallel_size=8) prompts = ["<|im_start|>user\n" for _ in range(100)] pre_query_template="<|im_start|>user\n" outputs = llm.generate(prompts, sampling_params)
I'm having the same problem, I'm trying to reason on 50K pieces of data at a time, but I get this error after 648 times of inference, do you mean I should stuff the 50K prompts into the list at a time, and then call the generate function?
My code: `def main(args): totallines = sum(1 for in open(args.input_file)) stop_tokens = args.stop_tokens.split(",")
with open(args.input_file, 'r') as input_file:
with open(args.output_file, 'a') as output_file:
llm = LLM(model="/home/disk1/LLMs/Meta-Llama-3___1-8B-Instruct", tensor_parallel_size=8)
for i, line in enumerate(tqdm(input_file, total=total_lines, desc="Extract objects from the description")):
if i < args.start_line:
continue
if i >= args.end_line:
break
json_obj = json.loads(line)
image_name = json_obj.get("image")
description = json_obj.get("description")
extr_prompt = f"""..."""
prompt = args.prompt_structure.format(input=extr_prompt)
sampling_params = SamplingParams(temperature=0.75, top_p=0.95, max_tokens=2048, stop=stop_tokens)
response = llm.generate(prompt, sampling_params)
obj_extr= response[0].outputs[0].text
extr_obj = obj_extr.split(". ")
output_data = {
"image": image_name,
"extr_obj_fr_desc": extr_obj,
"description": description
}
output_file.write(json.dumps(output_data) + '\n')`
I think there might be some scheduling bugs there. You can process 1k prompts each time, if it works for you.
I think there might be some scheduling bugs there. You can process 1k prompts each time, if it works for you.
Thank you. It sloved after use batch inference.
Your current environment
🐛 Describe the bug
When running the following code, the code will report an error after running two or three times. The code is as follows:
The error message is as follows:
How should I solve this problem?