meta-llama / llama

Inference code for Llama models
Other
56.11k stars 9.54k forks source link

Output includes input #563

Open Mega4alik opened 1 year ago

Mega4alik commented 1 year ago

Here's the code that I'm running

    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", cache_dir=cache_dir)
    model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", cache_dir=cache_dir)
    system_message = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
    prompt = "Tell me about AI"
    prompt_template = "[INST] <<SYS>>%s<</SYS>>\n\n %s [/INST]" % (system_message, prompt)
    inputs = tokenizer(prompt_template, return_tensors="pt")
    generate_ids = model.generate(inputs.input_ids, max_length=200) #, temperature=0.8
    resp = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    print(resp)

The result:

[INST] <<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

 Tell me about AI [/INST]  Of course! I'd be happy to help you understand more about AI.
Artificial intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. AI systems use algorithms

Why is it that output always includes input prompt? Am I missing some special token?

Thanks!

lazycat2991 commented 1 year ago

you can try the code below.

from langchain.llms import HuggingFacePipeline
from transformers import  pipeline

def textCompletions(prompt: str,  
                        model_id: str,
                        top_p,
                        top_k,
                        max_new_tokens,
                        temperature) -> List[str]:
        generate_text = pipeline(
            model= model, 
            tokenizer= tokenizer,
            return_full_text=True,  
            task='text-generation',
            temperature=float(temperature),
            max_new_tokens=max_new_tokens,  
            do_sample=True,
            num_beams=1,
            top_p=top_p,
            top_k=top_k,
            repetition_penalty=1.1  # without this output begins repeating
        )
        response = HuggingFacePipeline(pipeline=generate_text)(prompt=prompt)
        return response
Mega4alik commented 1 year ago

@lazycat2991 thanks! One more question, do you know if there's a way to stop generating at a certain token like '\n' (similar to GPT3 API)?

CreamyLong commented 10 months ago

Here's the code that I'm running

  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", cache_dir=cache_dir)
  model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", cache_dir=cache_dir)
  system_message = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
  prompt = "Tell me about AI"
  prompt_template = "[INST] <<SYS>>%s<</SYS>>\n\n %s [/INST]" % (system_message, prompt)
  inputs = tokenizer(prompt_template, return_tensors="pt")
  generate_ids = model.generate(inputs.input_ids, max_length=200) #, temperature=0.8
  resp = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
  print(resp)

The result:

[INST] <<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

 Tell me about AI [/INST]  Of course! I'd be happy to help you understand more about AI.
Artificial intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. AI systems use algorithms

Why is it that output always includes input prompt? Am I missing some special token?

Thanks!

I face the same problem, I wonder why this happens. Did you find the reason?

hans1120proP commented 1 month ago

Hi, I am a similar problem, I would like to generate pure output, but I had prompt + input+ source code + comments as output, can you guys give me any solutions ?