let LLM return JSON response

badbye commented 5 months ago

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

1. Is this request related to a challenge you're experiencing?

Yes. For optimal integration with other systems, it's crucial for large models to provide outputs in JSON format.

2. Describe the feature you'd like to see

Specify the output format on the webpage, the server will transform it into a Pydantic model. Utilize Pydantic parser to add instructions into the prompt, and interpret the results into a structured output.

3. How will this feature improve your workflow or experience?

Via API requests, it delivers a preset JSON format that can be seamlessly integrated with any other system.

4. Additional context or comments

No response

5. Can you help us with this feature?

[X] I am interested in contributing to this feature.

dosubot[bot] commented 5 months ago

🤖

Thank you for your feature request and your interest in contributing to the dify project. Your suggestion to allow large language models to provide outputs in JSON format using a Pydantic parser is indeed a valuable one. This feature would indeed improve the integration of dify with other systems.

The current implementation of the dify platform handles the responses from the large language model in the _handle_generate_response and _handle_generate_stream_response methods in the OAIAPICompatLargeLanguageModel class. These methods return the responses as instances of the LLMResult or LLMResultChunk classes, which are defined in the core.model_runtime.entities.llm_entities module.

To implement your feature, we would need to modify these methods to return the responses in a JSON format. This could be achieved by using the json module's dumps function to convert the LLMResult or LLMResultChunk instances to JSON strings. However, since these classes may contain non-serializable types, we may need to define custom serialization methods for them.

To use a Pydantic parser, we could define Pydantic models that mirror the structure of the LLMResult and LLMResultChunk classes, and use the parse_obj method of these Pydantic models to parse the JSON strings.

Here is an example of how we could modify the _handle_generate_response method to return the response in a JSON format:

from pydantic import BaseModel
import json

class LLMResultPydantic(BaseModel):
    model: str
    prompt_messages: list[PromptMessage]
    message: AssistantPromptMessage
    usage: dict

    # Define any other fields that are present in the LLMResult class

def _handle_generate_response(self, model: str, credentials: dict, response: requests.Response,
                              prompt_messages: list[PromptMessage]) -> str:

    # Existing code to generate the LLMResult instance
    # ...

    result = LLMResult(
        model=response_json["model"],
        prompt_messages=prompt_messages,
        message=assistant_message,
        usage=usage,
    )

    # Convert the LLMResult instance to a JSON string
    result_json = json.dumps(result.__dict__)

    # Parse the JSON string using the Pydantic model
    result_pydantic = LLMResultPydantic.parse_raw(result_json)

    return result_pydantic.json()

Please note that this is a simplified example and you may need to adjust it according to the actual structure of the LLMResult and LLMResultChunk classes and the types of their fields.

You can find the current structure of the LLMResult and LLMResultChunk classes in the core.model_runtime.entities.llm_entities module in the dify repository.

I hope this information is helpful. If you have any further questions or need more clarification, please don't hesitate to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

badbye commented 5 months ago

BTW, I already implemented it. Just need time to better test it.

takatost commented 5 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

badbye commented 5 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format.

To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler.

So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json mode, if it has a strong ability to follow instructions, it can return in the expected format.

takatost commented 4 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format.

To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler.

So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json mode, if it has a strong ability to follow instructions, it can return in the expected format.

Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊

badbye commented 4 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json mode, if it has a strong ability to follow instructions, it can return in the expected format.

Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊

I already said that, writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy.

takatost commented 4 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json mode, if it has a strong ability to follow instructions, it can return in the expected format.

Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊

I already said that, writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy.

I understand what you mean. Pydantic can make JSON return more standardized and stable output. What I don't quite understand is when we need to return JSON results during a Chat conversation. I'm more inclined to implement this feature in the current Text Generate App. BTW, the Text Generate App will be upgraded to a Workflow format in the future, and the JSON output functionality can be a feature of the LLM component in it.

takatost commented 4 months ago

Btw, we need to provide custom prompts for this feature to be compatible with models other than OpenAI. Anyway, thanks a lot for the idea! If you're interested, we can continue the conversation on WeChat: crazyphage.

badbye commented 4 months ago

So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format as json_object, right?

No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json mode, if it has a strong ability to follow instructions, it can return in the expected format.

Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊

I already said that, writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy.

I understand what you mean. Pydantic can make JSON return more standardized and stable output. What I don't quite understand is when we need to return JSON results during a Chat conversation. I'm more inclined to implement this feature in the current Text Generate App. BTW, the Text Generate App will be upgraded to a Workflow format in the future, and the JSON output functionality can be a feature of the LLM component in it.

I see, look forward for the workflow.

langgenius / dify