How to instruct the model for getting proper key value pair as json format, without getting any other text.

meta-llama / llama3

The official Meta Llama 3 GitHub site

Other

27.05k stars 3.07k forks source link

How to instruct the model for getting proper key value pair as json format, without getting any other text. #154

Open Dineshkumar-Anandan-ZS0367 opened 6 months ago

Dineshkumar-Anandan-ZS0367 commented 6 months ago

I need to get json results from the paragraph contains key value pairs, but llam3 instruct model return json format with some unwanted string, how to get proper answer from llama3 model.

Anyother options in coding or a parameter available to get that result.

aqib-mirza commented 6 months ago

If you specify the "format" and set it to "json" you will have your desired results.

Dineshkumar-Anandan-ZS0367 commented 6 months ago

llama3 8b instruct model, how to use this format params, can you share? Need a example or prompt related documentation.

aqib-mirza commented 6 months ago

Here is an example code """model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device="cuda", token = "HF-Token" )

messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak! and return every answer in JSON format"}, {"role": "user", "content": "Who are you?"}, ]

prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, format = "JSON" )

terminators = [ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>") ]

outputs = pipeline( prompt, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9,

) print(outputs[0]["generated_text"][len(prompt):])"""

Dineshkumar-Anandan-ZS0367 commented 6 months ago

Thanks a ton sir! I will check this.

Dineshkumar-Anandan-ZS0367 commented 6 months ago

Same prompt and same ocr text from image. Each request the llm gives different results, how can I maintain the results.

Is there any options for this, I understand this is a llm.

Can you suggest some ideas for prompt to extract key value pairs in a paragraph.

Dineshkumar-Anandan-ZS0367 commented 6 months ago

Getting same result as before inspite of using

prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, format = "JSON" )

LDelPinoNT commented 3 months ago

Having the same problem. Any update on this? Or any prompt hint?

Dineshkumar-Anandan-ZS0367 commented 3 months ago

Having the same problem. Any update on this? Or any prompt hint?

You need to explicitly mention you JSON Structure in the prompt. Its the only way to get expected JSON format. If you have got any other tokes in output, add post process logic inside your code.

YanJiaHuan commented 3 months ago

you can try lower the temperature hyperparameters @Dineshkumar-Anandan-ZS0367

Same prompt and same ocr text from image. Each request the llm gives different results, how can I maintain the results.

Is there any options for this, I understand this is a llm.

Can you suggest some ideas for prompt to extract key value pairs in a paragraph.

Dineshkumar-Anandan-ZS0367 commented 3 months ago

Thanks a lot for the response William