meta-llama / codellama

Inference code for CodeLlama models
Other
15.93k stars 1.85k forks source link

What is the max length could the codellama-2-7B generate? #170

Closed Uestc-Young closed 7 months ago

Uestc-Young commented 10 months ago

I was doing an inference work using codellama-2-7B.

Here is my code:

inputs_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(self.device)
generate_ids=model.generate(inputs_ids,max_new_tokens=1024,num_return_sequences=1,pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)

I want to know the maximum that max_new_token can be set to?

humza-sami commented 9 months ago

This base model can produce total 4096 tokens. You can set max_new_token to 4096. This 4096 tokens are including number of tokens of the prompts as well.

jgehring commented 9 months ago

Hi @Uestc-Young, please note that since generation is auto-regressive, the maximum length for generation is the maximum sequence length supported minus the length of the prompt. There is a max_seq_len argument that you can specify when you build the model, and you can set this parameter to up to 100000 (but, depending on your GPU, you may run into memory issues and hence go with a lower value).