Open shubhamagarwal92 opened 1 year ago
You probably meant someone else, not @pirj
Ahh sorry for that! Your name was getting recommended by Github!
cc @pcuenca and @philschmid as well here
If we need to control the length of input sequences should we initialize tokenizer with model_max_length=X, truncation=True?
Yes.
Shouldn't we then also pass the tokenizer when defining pipeline as above?
pipeline
automatically picks the tokenizer of the corresponding model, so specifying the tokenizer
is not needed
If we need to also control the length of output sequences, should we pass max_new_tokens=X to pipeline?
You can pass generation params as you said (but during inference, not loading). I recommend to check the docs for generation https://huggingface.co/docs/transformers/main/main_classes/text_generation to dive into the parameters.
In the code above, do we need to pass system_prompt or text when calling pipeline?
Yes, although you can get ok results without it. If you want to pass the system prompt to the chat llamas, you need to configure the prompt format as suggested in the blog post :)
I suggest to post questions in the forum too so it's easier to find for others! https://discuss.huggingface.co/
Hi,
It is not clear if we need to follow the prompt template for inference using pipeline as mentioned here or do we need to follow the pipeline code without special tokens as defined here.
Let's say with modified example code here:
Questions:
model_max_length=X, truncation=True
?max_new_tokens=X
to pipeline?model_max_length
is independent ofmax_new_tokens
? Or is itmodel_max_length = input_length + max_new_tokens
?system_prompt
ortext
when calling pipeline?7B-chat/13-chat/70B-chat
compared to7B/13B/70B
models?Related issues here: https://github.com/huggingface/transformers/issues/4501 https://github.com/facebookresearch/llama-recipes/issues/114
Thanks in advance!
cc @pirj @osanseviero