Open Tejaswgupta opened 6 months ago
I also encountered the same issue, and when I loaded the model locally, VLLM became extremely slow. here is my code:
from outlines.integrations.vllm import JSONLogitsProcessor
llm = LLM(model="Qwen/Qwen1.5-32B-Chat-GPTQ-Int4",
dtype='float16',quantization='gptq',max_model_len=32768,
gpu_memory_utilization=0.9)
tokenizer = AutoTokenizer.from_pretrained( "Qwen/Qwen1.5-32B-Chat-GPTQ-Int4")
default_sampling_params = SamplingParams(temperature=0, max_tokens=1000, logits_processors=[])
prompt = tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True )
class Person(BaseModel):
name:str=''
age:int=0
default_sampling_params.logits_processors.append(JSONLogitsProcessor(llm=llm,schema=Person))
result = llm.generate([prompt],default_sampling_params,use_tqdm=False)
and my vllm==0.4.0
Is it any faster with --guided-decoding-backend lm-format-enforcer
?
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
🐛 Describe the bug
While calling the model without
guided_json
it works fine and returns the response quickly , however since we want structured output it's a hit or miss. As soon as I pass the Pydantic model inguided_json
, the script just hangs and there's no request received on the server.You can try the following code(
guided_json
commented out).CODE: from pydantic import BaseModel
class RFPModel(BaseModel): rfp_title: str issuing_agency: str rfp_release_date: str submission_deadline: str primary_point_of_contact: str contract_type: str = None industry_sector: str = None eligibility_criteria: str = None scope_of_work_services_required: str = None budget_or_funding_amount: str = None place_of_performance: str = None procurement_method: str = None award_criteria: str = None additional_requirements: str = None
for c in all_splits: ex_prompt = f'''Analyze the text excerpt from a government Request for Proposal (RFP) provided below and extract the relevant metadata according to the predefined fields. Format the extracted information as a JSON object, where each field is represented as a key with its corresponding value. If a certain piece of information is not present in the current text chunk, but was included in previous chunks, incorporate that information as well. Use 'null' for any fields where data is not available in the current or previous chunks.
Extracted Metadata up until the current chunk:
{metadata}
Text Excerpt:
{c}
Please update the JSON object with the following structure, filling in each field with the extracted information or 'null' if the information is not available:
Ensure that the JSON keys are consistent with the metadata fields, and the values are accurately extracted from the RFP text. If the text chunk implies details that may relate to these fields without directly stating them, use inference to populate the fields appropriately. Your response should only be in English. Extracted JSON:''' out = client.chat.completions.create( messages=[ { "role": "system", "content": "You are an AI assistant that helps people extract relevant information from RFPs and structure it in JSON format. You should always respond in an accurate and honest manner.", }, {"role": "user", "content": ex_prompt}, ],
extra_body={