Open gaspard-dv opened 7 months ago
Thank you so much for the detailed report! Will come back to you shortly.
These timing results contain significant non-inference setup steps (e.g. json(model, dumps(output_format))
).
Yes indeed!
json(model, dumps(output_format))
takes a few seconds to complete and shouldn't be in the for-loop.
But this is not the step that gets "stuck".
It would still be nice to have results without having it in the loop, and use cProfile to understand which step "gets stuck". To get to similar experimental conditions I would also use the maxLength
field constraint.
Please try
class OutputModel(BaseModel):
poem: str
And pass OutputModel
instead output_format
. This schema ensures 'required': ['poem']
attr is included and you don't have any generations missing the poem
key.
Additionally, you will need to set whitespace_pattern
as explained here https://github.com/outlines-dev/outlines/issues/690#issuecomment-2102291934
json(model, dumps(output_format), whitespace_pattern=r"[ ]?")...
With these changes your script works for me and doesn't have any slow or failed inference.
Issue description
The issue was raised by other people on Discord too.
To quote one of them:
their screenshot
![image](https://github.com/outlines-dev/outlines/assets/72385491/47c59310-da9d-4c05-8d7a-9de082ab4393)Repro
I made a reproduction code snippet that can run in Google Colab (w/ free T4 GPU):
💻 Code snippet
```bash pip install outlines==0.0.13 transformers datasets optimum auto-gptq accelerate ``` ```python from outlines import models from outlines.text.generate import json, continuation from json import dumps from time import perf_counter import torch prompt = """<|system|> You are a friendly AI assistant. You're specialized in mathematics and open source Github repositories. Your answers must be concise and factual. <|user|> Write a very long poem <|assistant|> """ output_format = { "type": "object", "properties": { "poem": {"type": "string"} } } model = models.transformers("TheBloke/zephyr-7B-beta-GPTQ", device="cuda") ``` ```python rng = torch.Generator(device="cuda") rng.manual_seed(789001) errors = [] for i in range(20): start_time = perf_counter() try: sequence = json(model, dumps(output_format))(prompt, rng=rng) poem = sequence.get('poem') elapsed_time = round(perf_counter() - start_time) n_characters_per_second = len(poem) // elapsed_time print(f"{i}\t{elapsed_time}\t{n_characters_per_second}\t{poem[:30]}..") except Exception as e: errors.append(e) print(f"{i}\t{elapsed_time}\tINFERENCE FAILED") ```📃 Output
``` 0 14 76 In the vastness of cosmic spac.. 1 14 INFERENCE FAILED 2 769 0 In this universe, a vast expan.. 3 389 0 In ancient lands, where skies .. 4 16 67 In the depths of the cosmos, w.. 5 35 70 In the stillness of the mornin.. 6 32 60 In a universe vast and unceasi.. 7 13 77 75000 lines of blank verse, hi.. 8 22 69 In a land of purest light, Who.. 9 34 59 A cosmic dance of stars, a sym.. 10 49 68 In the land of the digit, wher.. 11 34 78 In a world vast and unknown, .. 12 43 68 There was a time when words we.. 13 54 70 In a world where chaos reigns.. 14 12 62 Let the words unfurl like the .. 15 330 0 Infinity beckons from the far .. 16 31 60 In the depths of the universe,.. 17 137 0 In this vast expanse of time a.. 18 32 81 in this universe vast and unfa.. ```💥 Exceptions raised
```python import traceback for error in errors: try: raise error except Exception as e: traceback.print_exc() ``` ``` Traceback (most recent call last): File "Results
Outlines/Python version information: