How to use the Vicuna model?

sev777 commented 1 year ago

The Vicuna model will generate some unrelated output, so how to control the max_lrngth in : model.generate(inputs.input_ids.cuda(), max_length=??)

a3616001 commented 1 year ago

Hi @sev777 , as long as you use in-context learning with a few demonstrations in the prompt, the output should follow the demonstrations' format in most cases. See the prompts we used in our experiments. For max_length, you may just use something long enough (e.g., max_length=300 for CoT cases) and then extract the actual prediction from the output.

sev777 commented 1 year ago

Hi @sev777 , as long as you use in-context learning with a few demonstrations in the prompt, the output should follow the demonstrations' format in most cases. See the prompts we used in our experiments. For max_length, you may just use something long enough (e.g., max_length=300 for CoT cases) and then extract the actual prediction from the output.

Thank you for the reply. But when I use the model from： https://huggingface.co/lmsys/vicuna-7b-v1.3
the generate code is as follow： inputs = tokenizer(cur_prompt, return_tensors="pt") generate_ids = model.generate(inputs.input_ids.cuda(), max_length=inputs.input_ids.shape[1]+100) res=tokenizer.batch_decode(generate_ids[:,inputs.input_ids.shape[1]:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] The _curprompt is same as the task_prompt in run_mello.ipynb. But the res is : ##รভ secretaryகष [unused327] answer :รভ secretaryகष ... Should I use Vicuna-chat? Thanks again！

princeton-nlp / MQuAKE

How to use the Vicuna model? #3