Updating falcon-7b with falcon-7b-instruct and StoppingCriteria

This PR helps update falcon-7b service to use falcon-7b-instruct model instead, which is better in conversations and a finetuned version of the base model falcon-7b.

It also adds support for returning only the generated response from the model which is controllable by return_full_text (bool) parameter.

It adds support for StoppingCriteria based on tokens too which helps doing something like below using langchain:

# stop param
LLMChain(llm=chat, prompt=prompt, verbose=True).run(user_message=user_message, stop=["\nUser:", "User:"])

premAI-io / prem-services

Updating falcon-7b with falcon-7b-instruct and StoppingCriteria #15