Closed waterangel91 closed 3 weeks ago
stop_conditions
needs to be a list, so if you're passing a single string, you should pass it as stop_conditions = [eos_token_str]
.
Be aware that tokens like EOS are usually defined as "special tokens" by the model. So they're not decoded by default and don't become strings (and therefore couldn't trigger a stop condition) unless you specify decode_special_tokens = True
. However, if you do have a stop token it's usually better to provide its ID as a stop condition instead. This eliminates any ambiguity since a string like "" could also appear naturally in the output text stream. You can get the ID from the string with eos_token_id = tokenizer.single_id(eos_token_str)
and then pass it with stop_conditions = [eos_token_id]
.
Thank you for the advice. After set decode_special_tokens = True
as well as the stop_conditions then the issue stopped.
Since have you here, can I ask if there is any way to access this result for the non stream generator? Or should i just use the stream generator for the non stream? Is there a speed difference between stream and non stream mode?
{'job': ExLlamaV2DynamicJob #2, 'stage': 'streaming', 'eos': True, 'serial': 2, 'eos_reason': 'stop_token', 'full_completion': 'The current President of the United States is Joe Biden. He assumed office on January 20, 2021.', 'new_tokens': 27, 'prompt_tokens': 42, 'time_enqueued': 7.724761962890625e-05, 'time_prefill': 0.040158987045288086, 'time_generate': 0.1675558090209961, 'cached_pages': 0, 'cached_tokens': 0}
The generate
function just uses streaming under the hood so there's not going to be any speed difference.
You can get the last result dict returned if you add return_last_result = True
to the call to generate
. The return value will become a tuple:
(completion: str, last_result: dict)
if you're passing a single input string, or(completions: list[str], last_results: list[dict])
if the input is a listThank you very much. Somehow I am still encountering issue with non stop generation here and there. So I ended up downgrad back to v0.0.21 for now. Maybe I will try again next version.
All the code you were using in v0.0.21 is still there in v0.1.1. You don't have to use the dynamic generator.
Thank, I just upgraded to v.0.1.3 and seems like all my old code working without any modification. Will close this thread and re-explore dynamic generator next time.
I have just updated to latest version today, and all my model seems to generate non stop garbabe eventhough i have explicitly put in eos. I am having this issue with all my model: mistral, mixtral, llama3.
Appreciate if anyone can tell what I did wrong, the same code worked fine before the upgrade
Selected snippet from my code as below ` settings = ExLlamaV2Sampler.Settings() settings.temperature = temperature settings.top_k = 50 settings.top_p = 0.8 settings.min_p = 0.35 settings.token_repetition_penalty = 1.1
generate_text = generator.generate( prompt=prompt, max_new_tokens=self.max_tokens-len(input_ids[0]), gen_settings=settings, stop_conditions=self.eos_token_str if self.eos_token_str else None, completion_only=True, ) `
Sample of the non stop generation