guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.68k stars 1.03k forks source link

substring() not working #499

Open wiiiktor opened 9 months ago

wiiiktor commented 9 months ago

I tried the guidance.substring on Google Colab, with a GPT-3.5 model and it loops without end; the error message is below. I also had some other problem with a Llama2 model from Huggingface (it returned "the " instead of some longer string), but this could be unrelated - below only the GPT error (Error code: 400 - {'error': {'message': "[] is too short):

gpt = models.OpenAI("gpt-3.5-turbo") from guidance import substring text = 'guidance is awesome. guidance is so great. guidance is the best thing since sliced bread.' gpt + f'Here is a true statement about the guidance library: "{substring(text)}"'

COLAB OUTPUT:

Here Exception in thread Thread-10 (_start_generator_stream): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/usr/local/lib/python3.10/dist-packages/guidance/models/_remote.py", line 100, in _start_generator_stream for chunk in generator: File "/usr/local/lib/python3.10/dist-packages/guidance/models/_openai.py", line 201, in _generator raise e File "/usr/local/lib/python3.10/dist-packages/guidance/models/_openai.py", line 191, in _generator generator = self.client.chat.completions.create( File "/usr/local/lib/python3.10/dist-packages/openai/_utils/_utils.py", line 301, in wrapper return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 598, in create return self._post( File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1096, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 856, in request return self._request( File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 908, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'error': {'message': "[] is too short - 'messages'", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Harsha-Nori commented 9 months ago

Hey @wiiiktor, copy+pasting part of my response from https://github.com/guidance-ai/guidance/issues/471#issuecomment-1837369683 as I believe you have the same issue. TL;DR is that you are using a Chat based model and likely need to use role tags (with system(), user(), assistant(), etc.) to work with GPT-3.5-turbo, or switch to a text completion model from OpenAI like gpt-3.5-turbo-instruct. Previous response below:


Our error message could be more helpful here. Most openAI models -- including gpt-3.5-turbo -- are Chat based models, which means you need to use role tags to specify and structure your prompts. Guidance does this with context managers, e.g. this example from our README:

gpt = models.OpenAI("gpt-3.5-turbo")

with user():
    lm = gpt + "What is the capital of France?"

with assistant():
    lm += gen("capital")

with user():
    lm += "What is one short surprising fact about it?"

with assistant():
    lm += gen("fact")

Chat based models won't work without role assignments, but your prompt works fine if you add them in:

image

OpenAI still supports some standard completion based models including text-davinci-003. The most capable text completion model they have -- which works without needing role tags -- is gpt-3.5-turbo-instruct (which is both cheaper and more capable than text-davinci-003). Consider using that model instead if you don't want to use Chat based models.

Harsha-Nori commented 9 months ago

Just pushed a better error message for this in the future

Harsha-Nori commented 9 months ago

On this:

I also had some other problem with a Llama2 model from Huggingface (it returned "the " instead of some longer string),

With substring -- you do have to be careful with as "" and short phrases like The are likely valid substrings of the input text. The grammar doesn't impose any additional constraints beyond roughly satisfying that

substring in source_string == True

There is value in also prompting with explicit instructions ahead of the substring call to help condition the model appropriately (eg to pick longer or more relevant substrings).

wiiiktor commented 9 months ago

I am still trying to run substring with GPT, with both chat / instruct models, I put my code here: https://colab.research.google.com/drive/1HwRFSN9e3jhCQEkY0c0YE5FSnvWftL0W?usp=sharing

image
Harsha-Nori commented 9 months ago

Unfortunately substring and other complex grammars do not (yet) work efficiently on remote/cloud hosted models -- like OpenAI or VertexAI models -- due to the limited API surface guidance has to work with them. From our README (https://github.com/guidance-ai/guidance#vertex-ai)

Remote endpoints that don't have explicit guidance integration are run "optimistically". This means that all the text that can be forced is given to the model as a prompt (or chat context) and then the model is run in streaming mode without hard constrants (since the remote API doesn't support them). If the model ever violates the contraints then the model stream is stopped and we optionally try it again at that point. This means that all the API-supported control work as expected, and more complex controls/parsing that is not supported by the API work if the model stays consistent with the program.

For this functionality to work best, we currently recommend using models from guidance.models.LlamaCpp and guidance.models.Transformers