Add example showing how to use HF API (setting top_p to 0.01)
Change OpenAI client to stop generating tokens after max_tokens has been reached. Tokens are counted using the provided tokenizer.
TODO: investigate whether there's a workaround that doesn't depend on the tokenizer.
TODO: consider an option for HF, to automatically manage top_p range 0. < x < 1.0
Motivation and Context (Why the change? What's the scenario?)
HuggingFace API doesn't support top_p == 0 and doesn't stop correctly generating tokens after max_tokens.
See https://github.com/microsoft/kernel-memory/issues/388
High level description (Approach, Design)
TODO: investigate whether there's a workaround that doesn't depend on the tokenizer. TODO: consider an option for HF, to automatically manage top_p range 0. < x < 1.0