JSON generated with invalid escape characters

pmbaumgartner commented 5 months ago

Describe the issue as clearly as possible:

Occasionally when I use outlines it will return a JSON string with invalid JSON. This happens most often when it generates an invalid escape character.

This is fairly hard to replicate because the frequency of this issue depends on the model and the prompt. The example code I have below generated this error on the 3rd iteration of the loop when I ran it, but now I'm trying to replicate it again and can't get it to happen.

I monkey-patched models/llamacpp.py to print out the offending string when there's a parse error. Here is an example of JSON that fails to parse:

{"answer":[{"statement":"Privacy preferences allow limiting sharing of creditworthiness data with other banks, insurance companies, and service providers.","reason":"The context does not mention anything about privacy preferences, but it does say you cannot limit sharing with other banks/insurance companies/service providers so that you won\\\'t get offers based on the data shared by the bank, but you can limit sharing with them","verdict":1},{"statement":"You cannot limit access to credit reports themselves.","reason":"The context explicitly states: \\"You cannot limit the credit reports themselves\\", which confirms this statement","verdict":1},{"statement":"Check credit reports on websites like annualcreditreport.com.","reason":"The context mentions using \\"http://www.annualcreditreport.com/\\" to look at your credit report","verdict":1},{"statement":"Dispute adverse items found on credit reports.","reason":"The context advises that \\"you always suggest that people dispute everything adverse\\" to put the onus on other parties to prove the adverse item is valid","verdict":1},{"statement":"Consider purchasing credit protection to receive notifications about new credit taken in your name.","reason":"The context says: \\"it does not hurt to ask\\" and \\"get credit protection so you will be notified when new credit is taken in your name\\"","verdict":1}]}

Using the traceback and an online JSON parser, I think the issue occurs with the generation of the substring \\\' starting around character 343.

You can replicate this specific example (with the models in the code snippet below) by attempting to parse an object like this:

FaithfulnessStatementInput.parse_raw('{"question":"Would you do anything for love?","answer":"\\"I won\\\'t do that\\""}')

This results in the same validation error I get with the escape character.

ValidationError: 1 validation error for FaithfulnessStatementInput
__root__
  Invalid \escape: line 1 column 64 (char 63) [type=value_error.jsondecode, input_value='{"question":"Would you d...I won\\\'t do that\\""}', input_type=str]

And a valid version, just for reference:

FaithfulnessStatementInput.parse_raw('{"question":"Would you do anything for love?","answer":"\\"I won\'t do that\\""}')

My apologies for the long code below for replicating - obviously not all of it is required to generate this specific issue, but I wanted to include everything I am doing in this instance that's generating invalid JSON.

Steps/code to reproduce the bug:

from typing import List

import outlines
from datasets import load_dataset
from pydantic import BaseModel, validate_call

data = load_dataset("explodinggradients/fiqa", "ragas_eval")
a = data["baseline"].select([16]).to_dict()

class FaithfulnessStatementInput(BaseModel):
    question: str
    answer: str

class FaithfulnessStatementOutput(BaseModel):
    statements: List[str]

class FaithfulnessStatementExample(
    FaithfulnessStatementInput, FaithfulnessStatementOutput
):
    @property
    def statements_json(self):
        return self.model_dump_json(include=["statements"])

STATEMENTS_EXAMPLE_1 = FaithfulnessStatementExample(
    question="Who was Albert Einstein and what is he best known for?",
    answer="He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics.",
    statements=[
        "Albert Einstein, a German-born theoretical physicist, is renowned for being one of the most influential physicists in history.",
        "Albert Einstein was best known for his theory of relativity.",
        "Einstein's contributions significantly advanced the field of quantum mechanics",
        "Recognized globally, Einstein's work has profoundly impacted the scientific community",
        "Einstein's groundbreaking theories continue to shape our understanding of physics today.",
    ],
)
STATEMENTS_EXAMPLE_2 = FaithfulnessStatementExample(
    question="Cadmium Chloride is slightly soluble in this chemical, it is also called what?",
    answer="alcohol",
    statements=["Cadmium Chloride is slightly soluble in alcohol."],
)
STATEMENTS_EXAMPLE_3 = FaithfulnessStatementExample(
    question="Were Were Hitler and Benito Mussolini of the same nationality?",
    answer="Sorry, I can't provide answer to that question.",
    statements=[],
)

DEFAULT_STATEMENTS_EXAMPLES = [
    STATEMENTS_EXAMPLE_1,
    STATEMENTS_EXAMPLE_2,
    STATEMENTS_EXAMPLE_3,
]

@outlines.prompt
@validate_call
def faithfulness_statements(
    input: FaithfulnessStatementInput,  # noqa
    examples: List[FaithfulnessStatementExample] = DEFAULT_STATEMENTS_EXAMPLES,  # noqa
):
    # This is a combination of the RAGAS template and the DeepEval template
    """
    Anaylize the provided question and answer pairs and identify one or more statements from each sentence in the given answer. \
    A statement is a claim or informational point that is present in the answer or can be inferred from the answer and question.

    ## Examples:
    {% for example in examples %}
    Question: {{example.question}}
    Answer: {{example.answer}}
    Statements: {{example.statements_json}}

    {% endfor %}

    ## Actual:
    Question: {{input.question}}
    Answer: {{input.answer}}
    Statements:
    """

model_path = "models/openhermes-2.5-neural-chat-v3-3-slerp.Q4_K_M.gguf"

model = outlines.models.llamacpp(model_path, n_ctx=0, max_tokens=0, n_gpu_layers=-1)

generator = outlines.generate.json(model, FaithfulnessStatementOutput)

st = FaithfulnessStatementInput(
    question=a["question"][0],
    answer=a["answer"][0],
)

statements_result = generator(
    faithfulness_statements(input=st, examples=DEFAULT_STATEMENTS_EXAMPLES)
)

class FaithfulnessNLIInput(BaseModel):
    context: List[str]
    statements: List[str]

class FaithfulnessNLIAnswer(BaseModel):
    statement: str
    reason: str
    verdict: int

class FaithfulnessNLIOutput(BaseModel):
    answer: List[FaithfulnessNLIAnswer]

class FaithfulnessNLIExample(FaithfulnessNLIInput, FaithfulnessNLIOutput):
    @property
    def answer_json(self):
        return self.model_dump_json(include=["answer"])

NLI_EXAMPLE_1 = FaithfulnessNLIExample(
    context=[
        "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects."
    ],
    statements=[
        "John is majoring in Biology.",
        "John is taking a course on Artificial Intelligence.",
        "John is a dedicated student.",
        "John has a part-time job.",
    ],
    answer=[
        FaithfulnessNLIAnswer(
            statement="John is majoring in Biology.",
            reason="John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.",
            verdict=0,
        ),
        FaithfulnessNLIAnswer(
            statement="John is taking a course on Artificial Intelligence.",
            reason="The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.",
            verdict=0,
        ),
        FaithfulnessNLIAnswer(
            statement="John is a dedicated student.",
            reason="The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.",
            verdict=1,
        ),
        FaithfulnessNLIAnswer(
            statement="John has a part-time job.",
            reason="There is no information given in the context about John having a part-time job.",
            verdict=0,
        ),
    ],
)
NLI_EXAMPLE_2 = FaithfulnessNLIExample(
    context=[
        "Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy."
    ],
    statements=["Albert Einstein was a genius."],
    answer=[
        FaithfulnessNLIAnswer(
            statement="Albert Einstein was a genius.",
            reason="The context and statement are unrelated",
            verdict=0,
        ),
    ],
)
NLI_EXAMPLE_3 = FaithfulnessNLIExample(
    context=[
        "Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time."
    ],
    statements=[],
    answer=[
        FaithfulnessNLIAnswer(
            statement="",
            reason="No statements were provided",
            verdict=-1,
        ),
    ],
)

DEFAULT_NLI_EXAMPLES = [NLI_EXAMPLE_1, NLI_EXAMPLE_2, NLI_EXAMPLE_3]

@outlines.prompt
def faithfulness_nli(
    input: FaithfulnessNLIInput,  # noqa: ARG001
    examples: List[FaithfulnessNLIExample] = DEFAULT_NLI_EXAMPLES,  # noqa: ARG001
):
    """
    You are an expert in natural language inference. \
    The goal is to determine if a given statement can be logically inferred or deduced from the provided context.
    Use only 'Yes' (1), 'No' (0) and 'Null' (-1) as verdict.

    'Yes' (1) means the statement can be inferred from the context with certainty.
    'No' (0) means the statement cannot be inferred from the context or contradicts the information provided.
    'Null' (-1) means there is insufficient information in the context to determine if the statement is true or false.

    ## Examples:
    {% for example in examples %}
    Context: {{example.context}}
    Statements: {{example.statements}}
    Answer: {{example.answer_json}}

    {% endfor %}

    ## Actual:
    Context: {{input.context}}
    Statements: {{input.statements}}
    Answer:
    """

nli_task = FaithfulnessNLIInput(
    context=a["contexts"][0], statements=statements_result.statements
)

generator_nli = outlines.generate.json(model, FaithfulnessNLIOutput)

requests_batched = [
    faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES) for _ in range(100)
]
# Run until ValidationError is generated
nli_results_batched = generator_nli(requests_batched)

Expected result:

Valid JSON with no invalid escapes that can be parsed back into the pydantic model.

Error message:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/pydantic/main.py:1097, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
   1096 try:
-> 1097     obj = parse.load_str_bytes(
   1098         b,
   1099         proto=proto,
   1100         content_type=content_type,
   1101         encoding=encoding,
   1102         allow_pickle=allow_pickle,
   1103     )
   1104 except (ValueError, TypeError) as exc:

File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/pydantic/deprecated/parse.py:49, in load_str_bytes(b, content_type, encoding, proto, allow_pickle, json_loads)
     48         b = b.decode(encoding)
---> 49     return json_loads(b)  # type: ignore
     50 elif proto == Protocol.pickle:

File ~/.pyenv/versions/3.10.10/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/.pyenv/versions/3.10.10/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~/.pyenv/versions/3.10.10/lib/python3.10/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:

JSONDecodeError: Invalid \escape: line 1 column 343 (char 342)

During handling of the above exception, another exception occurred:

ValidationError                           Traceback (most recent call last)
File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/outlines/models/llamacpp.py:78, in LlamaSequenceGenerator.__call__(self, prompts, max_tokens, stop_at, rng, **model_kwargs)
     77 try:
---> 78     formatted = [self.format_sequence(sequence) for sequence in results]
     79 except Exception as e:

File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/outlines/models/llamacpp.py:78, in <listcomp>(.0)
     77 try:
---> 78     formatted = [self.format_sequence(sequence) for sequence in results]
     79 except Exception as e:

File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/outlines/generate/json.py:50, in json.<locals>.<lambda>(x)
     49     generator = regex(model, regex_str, sampler)
---> 50     generator.format_sequence = lambda x: schema_object.parse_raw(x)
     51 elif callable(schema_object):

File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/pydantic/main.py:1124, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
   1118     error: pydantic_core.InitErrorDetails = {
   1119         # The type: ignore on the next line is to ignore the requirement of LiteralString
   1120         'type': pydantic_core.PydanticCustomError(type_str, str(exc)),  # type: ignore
   1121         'loc': ('__root__',),
   1122         'input': b,
   1123     }
-> 1124     raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
   1125 return cls.model_validate(obj)

ValidationError: 1 validation error for FaithfulnessNLIOutput
__root__
  Invalid \escape: line 1 column 343 (char 342) [type=value_error.jsondecode, input_value='{"answer":[{"statement":...name\\"","verdict":1}]}', input_type=str]

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[8], line 332
    326 generator_nli = outlines.generate.json(model, FaithfulnessNLIOutput)
    327 nli_result = generator_nli(
    328     faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES)
    329 )
--> 332 nli_results_2 = [
    333     generator_nli(faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES))
    334     for _ in range(3)
    335 ]
    337 requests_batched = [
    338     faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES) for _ in range(100)
    339 ]
    340 nli_results_batched = generator_nli(requests_batched)

Cell In[8], line 333, in <listcomp>(.0)
    326 generator_nli = outlines.generate.json(model, FaithfulnessNLIOutput)
    327 nli_result = generator_nli(
    328     faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES)
    329 )
    332 nli_results_2 = [
--> 333     generator_nli(faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES))
    334     for _ in range(3)
    335 ]
    337 requests_batched = [
    338     faithfulness_nli(input=nli_task, examples=DEFAULT_NLI_EXAMPLES) for _ in range(100)
    339 ]
    340 nli_results_batched = generator_nli(requests_batched)

File ~/.pyenv/versions/3.10.10/envs/my-env/lib/python3.10/site-packages/outlines/models/llamacpp.py:81, in LlamaSequenceGenerator.__call__(self, prompts, max_tokens, stop_at, rng, **model_kwargs)
     79 except Exception as e:
     80     print(results)
---> 81     raise ValueError(f"Error formatting sequences: {e}")
     83 return formatted if len(formatted) > 1 else formatted[0]

ValueError: Error formatting sequences: 1 validation error for FaithfulnessNLIOutput
__root__
  Invalid \escape: line 1 column 343 (char 342) [type=value_error.jsondecode, input_value='{"answer":[{"statement":...name\\"","verdict":1}]}', input_type=str]

Outlines/Python version information:

python -c "import sys; print('Python', sys.version)" pip freeze 0.0.36 Python 3.10.10 (main, Jun 19 2023, 11:34:34) [Clang 14.0.0 (clang-1400.0.29.202)] accelerate==0.28.0 aiohttp==3.9.3 aiosignal==1.3.1 altair==5.2.0 annotated-types==0.6.0 anyio==4.3.0 appdirs==1.4.4 appnope==0.1.4 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 async-timeout==4.0.3 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.3 bleach==6.1.0 boto3==1.34.63 botocore==1.34.63 bpemb==0.3.4 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==3.0.0 comm==0.2.2 conllu==4.5.3 contourpy==1.2.0 cycler==0.12.1 dataclasses-json==0.6.4 datasets==2.18.0 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.14 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 docstring-parser==0.15 exceptiongroup==1.2.0 executing==2.0.1 fastapi==0.110.0 fastjsonschema==2.19.1 filelock==3.13.1 flair==0.13.1 fonttools==4.49.0 fqdn==1.5.1 frozenlist==1.4.1 fsspec==2024.2.0 ftfy==6.1.3 gdown==5.1.0 gensim==4.3.2 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 huggingface-hub==0.21.4 idna==3.6 instructor==0.6.4 interegular==0.3.3 ipykernel==6.29.3 ipython==8.22.2 ipywidgets==8.1.2 isoduration==20.11.0 Janome==0.5.0 jedi==0.19.1 Jinja2==3.1.3 jmespath==1.0.1 joblib==1.3.2 json5==0.9.24 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.1 jupyter-lsp==2.2.4 jupyter_client==8.6.1 jupyter_core==5.7.2 jupyter_server==2.13.0 jupyter_server_terminals==0.5.3 jupyterlab==4.1.5 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.4 jupyterlab_widgets==3.0.10 kiwisolver==1.4.5 langchain==0.1.12 langchain-community==0.0.28 langchain-core==0.1.32 langchain-openai==0.0.8 langchain-text-splitters==0.0.1 langdetect==1.0.9 langsmith==0.1.27 lark==1.1.9 llama_cpp_python==0.2.56 llvmlite==0.42.0 lxml==5.1.0 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.1 matplotlib==3.8.3 matplotlib-inline==0.1.6 mdurl==0.1.2 mistune==3.0.2 more-itertools==10.2.0 mpld3==0.5.10 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 mypy-extensions==1.0.0 nbclient==0.10.0 nbconvert==7.16.2 nbformat==5.10.3 nest-asyncio==1.6.0 networkx==3.2.1 notebook==7.1.2 notebook_shim==0.2.4 numba==0.59.0 numpy==1.26.4 openai==1.13.3 orjson==3.9.15 outlines==0.0.36 overrides==7.7.0 packaging==23.2 pandas==2.2.1 pandocfilters==1.5.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 platformdirs==4.2.0 pptree==3.1 prometheus_client==0.20.0 prompt-toolkit==3.0.43 protobuf==5.26.0 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==15.0.1 pyarrow-hotfix==0.6 pycparser==2.21 pydantic==2.6.3 pydantic-settings==2.2.1 pydantic_core==2.16.3 Pygments==2.17.2 pyparsing==3.1.2 pysbd==0.3.4 PySocks==1.7.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-json-logger==2.0.7 pytorch_revgrad==0.2.0 pytz==2024.1 PyYAML==6.0.1 pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 ragas==0.1.4 referencing==0.33.0 regex==2023.12.25 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rich==13.7.1 rpds-py==0.18.0 ruff==0.3.2 s3transfer==0.10.1 safetensors==0.4.2 scikit-learn==1.4.1.post1 scipy==1.12.0 segtok==1.5.11 semver==3.0.2 Send2Trash==1.8.2 sentence-transformers==2.5.1 sentencepiece==0.1.99 seqeval==1.2.2 six==1.16.0 smart-open==7.0.1 sniffio==1.3.1 soupsieve==2.5 SQLAlchemy==2.0.28 sqlitedict==2.1.0 sse-starlette==2.0.0 stack-data==0.6.3 starlette==0.36.3 starlette-context==0.3.6 sympy==1.12 tabulate==0.9.0 tenacity==8.2.3 terminado==0.18.1 threadpoolctl==3.3.0 tiktoken==0.6.0 tinycss2==1.2.1 tokenizers==0.15.2 tomli==2.0.1 toolz==0.12.1 torch==2.2.1 tornado==6.4 tqdm==4.66.2 traitlets==5.14.1 transformer-smaller-training-vocab==0.3.3 transformers==4.38.2 typer==0.9.0 types-python-dateutil==2.9.0.20240316 typing-inspect==0.9.0 typing_extensions==4.10.0 tzdata==2024.1 uri-template==1.3.0 urllib3==1.26.18 uvicorn==0.28.0 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 Wikipedia-API==0.6.0 wrapt==1.16.0 xxhash==3.4.1 yarl==1.9.4

Context for the issue:

No response

pmbaumgartner commented 5 months ago

Just adding one more piece of context here: I notice the characters won\'t are included in the prompt through the template - the phrase that gets injected is While you can limit the sharing with other banks/insurance companies/service providers so that you won\'t get offers from them based on the data shared by the bank, you cannot limit the credit reports themselves.. I'm guessing there is something at the LLM level that is replicating this sequence of characters and then something is failing at the JSON generation.

In case it's additional help, here's is the output of json.dumps on this phrase:

In [27]: print(json.dumps(r"While you can limit the sharing with other banks/insurance companies/service providers so that you won\'t get offer
    ...: s from them based on the data shared by the bank, you cannot limit the credit reports themselves."))
"While you can limit the sharing with other banks/insurance companies/service providers so that you won\\'t get offers from them based on the data shared by the bank, you cannot limit the credit reports themselves."

gautierdag commented 5 months ago

I'm also seeing the same issue (mostly with Mistral models).. I think this is a hard one to solve, but ideally JSON decoding should prevent incorrect use of escape characters.

pmbaumgartner commented 5 months ago

I also see this the most with Mistral models. Other's I've evaluated against are Hermes-2-Pro-Mistral-7B, alphamonarch-7b, openhermes-2.5-neural-chat-v3-3-slerp, and mistral-7b-instruct-v0.2 - the last one having this issue most frequently.

rlouf commented 5 months ago

Have you tried different white space patterns?

pmbaumgartner commented 5 months ago

I haven't with this specific problem, but I will give it a shot. Though I have to say it's not clear to me how it would help with this specific issue, since that would modify the whitespace but not prevent it from generating JSON with invalid escape characters - unless I'm missing something.

pmbaumgartner commented 5 months ago

Here is a smaller replicable example.

import outlines
from pydantic import BaseModel

class Input(BaseModel):
    value: str

kwargs = {"n_ctx": 0, "max_tokens": 0, "n_gpu_layers": -1, "verbose": False}
model = outlines.models.llamacpp(
    "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    **kwargs,
)

generator = outlines.generate.json(model, Input)

prompt = r"""You are a helpful assistant. Your task is to return a given input word in JSON format.

Return the following value in JSON:

{"value": "won\\'t"}
"""
for _ in range(20):
    result = generator(prompt)

Should fail with the following exception:

ValueError: Error formatting sequences: 1 validation error for Input
__root__
  Invalid \escape: line 2 column 16 (char 17) [type=value_error.jsondecode, input_value='{\n  "value": "won\\\'t"\n}', input_type=str]

rlouf commented 4 months ago

This is a more general problem with the regexes we use I think.

AndreasGiersch commented 4 months ago

Is there any update on this yet? I've also encountered this problem with structured generation using pydantic. All of our used models ("mistral-7b-instruct-v0.2", "mistralai/Mixtral-8x7B-v0.1" quantized and llama-2-7Bf, llama-2-13Bf, llama-2-70Bf quantized) are affected. So far, I was not able to track down any inputs which definitely lead to a faulty output.

umbe95 commented 4 months ago

Also for me, same error.

psykhi commented 4 months ago

Same error for me. Any non trivial generation is likely to fail with Mistral 7B Instruct v0.2

Here's an example that failed for me after 5 retries.

Using outlines via vLLM openAI server.

rlouf commented 4 months ago

The regex that is used to describe valid characters allows the generation of odd number of escape characters. This should be fixed by #829

psykhi commented 4 months ago

Thanks a lot @rlouf! Do you know when you'll release this? Want to open a PR in vLLM to update the deps.

rlouf commented 4 months ago

Did you try the code in main?

psykhi commented 4 months ago

No I haven't., we use outlines via the vLLM openAPI server. I can set up a repro script with outlines directly but that might have to wait for next week. Can report back when I have done that.

outlines-dev / outlines