Closed badbye closed 4 months ago
🤖
Thank you for your feature request and your interest in contributing to the dify project. Your suggestion to allow large language models to provide outputs in JSON format using a Pydantic parser is indeed a valuable one. This feature would indeed improve the integration of dify with other systems.
The current implementation of the dify platform handles the responses from the large language model in the _handle_generate_response
and _handle_generate_stream_response
methods in the OAIAPICompatLargeLanguageModel
class. These methods return the responses as instances of the LLMResult
or LLMResultChunk
classes, which are defined in the core.model_runtime.entities.llm_entities
module.
To implement your feature, we would need to modify these methods to return the responses in a JSON format. This could be achieved by using the json
module's dumps
function to convert the LLMResult
or LLMResultChunk
instances to JSON strings. However, since these classes may contain non-serializable types, we may need to define custom serialization methods for them.
To use a Pydantic parser, we could define Pydantic models that mirror the structure of the LLMResult
and LLMResultChunk
classes, and use the parse_obj
method of these Pydantic models to parse the JSON strings.
Here is an example of how we could modify the _handle_generate_response
method to return the response in a JSON format:
from pydantic import BaseModel
import json
class LLMResultPydantic(BaseModel):
model: str
prompt_messages: list[PromptMessage]
message: AssistantPromptMessage
usage: dict
# Define any other fields that are present in the LLMResult class
def _handle_generate_response(self, model: str, credentials: dict, response: requests.Response,
prompt_messages: list[PromptMessage]) -> str:
# Existing code to generate the LLMResult instance
# ...
result = LLMResult(
model=response_json["model"],
prompt_messages=prompt_messages,
message=assistant_message,
usage=usage,
)
# Convert the LLMResult instance to a JSON string
result_json = json.dumps(result.__dict__)
# Parse the JSON string using the Pydantic model
result_pydantic = LLMResultPydantic.parse_raw(result_json)
return result_pydantic.json()
Please note that this is a simplified example and you may need to adjust it according to the actual structure of the LLMResult
and LLMResultChunk
classes and the types of their fields.
You can find the current structure of the LLMResult
and LLMResultChunk
classes in the core.model_runtime.entities.llm_entities
module in the dify repository.
I hope this information is helpful. If you have any further questions or need more clarification, please don't hesitate to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
BTW, I already implemented it. Just need time to better test it.
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter response_format
as json_object
, right?
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter
response_format
asjson_object
, right?
No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format.
To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler.
So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support json
mode, if it has a strong ability to follow instructions, it can return in the expected format.
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter
response_format
asjson_object
, right?No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format.
To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler.
So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support
json
mode, if it has a strong ability to follow instructions, it can return in the expected format.
Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter
response_format
asjson_object
, right?No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support
json
mode, if it has a strong ability to follow instructions, it can return in the expected format.Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊
I already said that, writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy
.
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter
response_format
asjson_object
, right?No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support
json
mode, if it has a strong ability to follow instructions, it can return in the expected format.Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊
I already said that,
writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy
.
I understand what you mean. Pydantic can make JSON return more standardized and stable output. What I don't quite understand is when we need to return JSON results during a Chat conversation. I'm more inclined to implement this feature in the current Text Generate App. BTW, the Text Generate App will be upgraded to a Workflow format in the future, and the JSON output functionality can be a feature of the LLM component in it.
Btw, we need to provide custom prompts for this feature to be compatible with models other than OpenAI. Anyway, thanks a lot for the idea! If you're interested, we can continue the conversation on WeChat: crazyphage.
So you want the API response to be an object instead of a JSON string when using the OpenAI model parameter
response_format
asjson_object
, right?No. Only a few models(gpt-4-1106-preview and gpt-35-turbo-1106) support the json mode. However, given an instruction, almost all the models are able to return in JSON format. To elaborate further, writing the corresponding prompt for the model to return in a specific JSON structure is a challenge. Comparatively, defining the JSON structure is simpler. So, my idea is to allow users to define the structure on the web page, and we inject the corresponding prompt on the server side. Given a model that does not support
json
mode, if it has a strong ability to follow instructions, it can return in the expected format.Since the model is able to return a JSON string following the instructions, why do we still need Pydantic conversion? What is your scenario and purpose? 😊
I already said that,
writing the instructions for the model to return in a specific JSON structure is a challenge, defining the JSON structure is easy
.I understand what you mean. Pydantic can make JSON return more standardized and stable output. What I don't quite understand is when we need to return JSON results during a Chat conversation. I'm more inclined to implement this feature in the current Text Generate App. BTW, the Text Generate App will be upgraded to a Workflow format in the future, and the JSON output functionality can be a feature of the LLM component in it.
I see, look forward for the workflow.
1. Is this request related to a challenge you're experiencing?
Yes. For optimal integration with other systems, it's crucial for large models to provide outputs in JSON format.
2. Describe the feature you'd like to see
Specify the output format on the webpage, the server will transform it into a Pydantic model. Utilize Pydantic parser to add instructions into the prompt, and interpret the results into a structured output.
3. How will this feature improve your workflow or experience?
Via API requests, it delivers a preset JSON format that can be seamlessly integrated with any other system.
4. Additional context or comments
No response
5. Can you help us with this feature?