This PR helps update falcon-7b service to use falcon-7b-instruct model instead, which is better in conversations and a finetuned version of the base model falcon-7b.
It also adds support for returning only the generated response from the model which is controllable by return_full_text (bool) parameter.
It adds support for StoppingCriteria based on tokens too which helps doing something like below using langchain:
This PR helps update falcon-7b service to use falcon-7b-instruct model instead, which is better in conversations and a finetuned version of the base model falcon-7b.
It also adds support for returning only the generated response from the model which is controllable by
return_full_text (bool)
parameter.It adds support for StoppingCriteria based on tokens too which helps doing something like below using langchain: