Some inferences take forever to complete

gaspard-dv commented 7 months ago

Issue description

The issue was raised by other people on Discord too.

To quote one of them:

I'm running the same query 10 times (with equivalent prompts and output sizes), but some inferences are taking abnormally longer than others.

their screenshot

![image](https://github.com/outlines-dev/outlines/assets/72385491/47c59310-da9d-4c05-8d7a-9de082ab4393)

Repro

I made a reproduction code snippet that can run in Google Colab (w/ free T4 GPU):

💻 Code snippet

```bash pip install outlines==0.0.13 transformers datasets optimum auto-gptq accelerate ``` ```python from outlines import models from outlines.text.generate import json, continuation from json import dumps from time import perf_counter import torch prompt = """<|system|> You are a friendly AI assistant. You're specialized in mathematics and open source Github repositories. Your answers must be concise and factual. <|user|> Write a very long poem <|assistant|> """ output_format = { "type": "object", "properties": { "poem": {"type": "string"} } } model = models.transformers("TheBloke/zephyr-7B-beta-GPTQ", device="cuda") ``` ```python rng = torch.Generator(device="cuda") rng.manual_seed(789001) errors = [] for i in range(20): start_time = perf_counter() try: sequence = json(model, dumps(output_format))(prompt, rng=rng) poem = sequence.get('poem') elapsed_time = round(perf_counter() - start_time) n_characters_per_second = len(poem) // elapsed_time print(f"{i}\t{elapsed_time}\t{n_characters_per_second}\t{poem[:30]}..") except Exception as e: errors.append(e) print(f"{i}\t{elapsed_time}\tINFERENCE FAILED") ```

📃 Output

``` 0 14 76 In the vastness of cosmic spac.. 1 14 INFERENCE FAILED 2 769 0 In this universe, a vast expan.. 3 389 0 In ancient lands, where skies .. 4 16 67 In the depths of the cosmos, w.. 5 35 70 In the stillness of the mornin.. 6 32 60 In a universe vast and unceasi.. 7 13 77 75000 lines of blank verse, hi.. 8 22 69 In a land of purest light, Who.. 9 34 59 A cosmic dance of stars, a sym.. 10 49 68 In the land of the digit, wher.. 11 34 78 In a world vast and unknown, .. 12 43 68 There was a time when words we.. 13 54 70 In a world where chaos reigns.. 14 12 62 Let the words unfurl like the .. 15 330 0 Infinity beckons from the far .. 16 31 60 In the depths of the universe,.. 17 137 0 In this vast expanse of time a.. 18 32 81 in this universe vast and unfa.. ```

💥 Exceptions raised

```python import traceback for error in errors: try: raise error except Exception as e: traceback.print_exc() ``` ``` Traceback (most recent call last): File "", line 5, in raise error File "", line 8, in sequence = json(model, dumps(output_format))(prompt, rng=rng) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/sequence.py", line 240, in __call__ result = self.postprocess_completions(result) File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in postprocess_completions return [self.format_fn(result) for result in results] File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in return [self.format_fn(result) for result in results] File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 397, in format_fn = lambda x: pyjson.loads(x) File "/usr/lib/python3.10/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/usr/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Invalid \escape: line 2 column 570 (char 571) ```

Results

✅ 14 inferences succeeded fast
⏰ 5 inferences succeeded but were extremely slow (indices: 2, 3, 15, 17, 19)
💥 1 inference failed fast (index: 1)

Outlines/Python version information:

Outlines 0.0.13
Python 3.10.12

rlouf commented 7 months ago

Thank you so much for the detailed report! Will come back to you shortly.

brandonwillard commented 7 months ago

These timing results contain significant non-inference setup steps (e.g. json(model, dumps(output_format))).

gaspard-dv commented 7 months ago

Yes indeed! json(model, dumps(output_format)) takes a few seconds to complete and shouldn't be in the for-loop. But this is not the step that gets "stuck".

rlouf commented 7 months ago

It would still be nice to have results without having it in the loop, and use cProfile to understand which step "gets stuck". To get to similar experimental conditions I would also use the maxLength field constraint.

lapp0 commented 2 months ago

Please try

class OutputModel(BaseModel):
    poem: str

And pass OutputModel instead output_format. This schema ensures 'required': ['poem'] attr is included and you don't have any generations missing the poem key.

Additionally, you will need to set whitespace_pattern as explained here https://github.com/outlines-dev/outlines/issues/690#issuecomment-2102291934

json(model, dumps(output_format), whitespace_pattern=r"[ ]?")...

With these changes your script works for me and doesn't have any slow or failed inference.

outlines-dev / outlines