Open andreadimaio opened 1 month ago
I think it does
cc @langchain4j
@andreadimaio do you want to work on this?
@andreadimaio do you want to work on this?
Yes, I'll open a new PR
🙏🏽
The implementation is a little more complex than what I have in mind, for the simple reason that OpenAi also has another response_format
option called json_schema
(watsonx.ai only supports json_object
).
If the json_schema
option is enabled, the API will also take the schema of the object as input. In this case it is useful to create this schema at build time.
I'm not an expert on the OpenAi APIs, but I think that if response_format
is equal to json_schema
, it makes no sense to inject the message You must answer strictly in the following JSON format: ...
into the prompt, because OpenAi will do something to make sure this happens. This message can be injected for all other types TEXT
and JSON_OBJECT
, but this is a detail that for now can be overlooked.
@andreadimaio it works like this in vanilla LC4j. If schema can be passed, we do not append extra instructions
But schema is now supported only by OpenAI and Gemini
There's something that's not clear to me. Looking at class DefaultAiServices.java there are these lines:
Response < AiMessage > response;
if (supportsJsonSchema && jsonSchema.isPresent()) {
ChatRequest chatRequest = ChatRequest.builder()
.messages(messages)
.toolSpecifications(toolSpecifications)
.responseFormat(ResponseFormat.builder()
.type(JSON)
.jsonSchema(jsonSchema.get())
.build())
.build();
ChatResponse chatResponse = context.chatModel.chat(chatRequest);
response = new Response < > (
chatResponse.aiMessage(),
chatResponse.tokenUsage(),
chatResponse.finishReason()
);
} else {
// TODO migrate to new API
response = toolSpecifications == null ?
context.chatModel.generate(messages) :
context.chatModel.generate(messages, toolSpecifications);
}
The chat
method is invoked only when the provider supports the json_schema
but what about the json_object
? I would like to force the use of the chat
method even in this case. Maybe the class Capability should contain also this type.
Another note is about the default implementation of the chat method. It has all the parameters to call the generate
method if the provider doesn't support the response_format
. Isn't it better to have this kind of default implementation instead of throwing an exception? In this case it should be easier to handle the chat
method for all model providers (maybe I'm missing something?!).
@langchain4j
Or your idea is to use RESPONSE_FORMAT_JSON_SCHEMA
for both values (json_object
, json_schema
)? Maybe yes, because in the end the logic inside the chat
method can handle the variable passed to make the correct call to the endpoint.
I was planning to add another Capability for json_object. It should be easy as we know which providers support Json mode.
Regarding the default implementation of the chat method, you're right, it should call the generate methods. I actually implemented it this way initially, but then rolled back because I had some doubts about it. This is work in progress, I plan to get back to this new API soon. Eventually generate methods will be deprecated and providers will need to implement only one method: chat.
Chat method is used only when Json capability is present because I had to rush this new chat API in order to enable structured outputs. Otherwise there was no way to pass the schema. WIP...
I was planning to add another Capability for json_object. It should be easy as we know which providers support Json mode.
Regarding the default implementation of the chat method, you're right, it should call the generate methods. I actually implemented it this way initially, but then rolled back because I had some doubts about it. This is work in progress, I plan to get back to this new API soon. Eventually generate methods will be deprecated and providers will need to implement only one method: chat.
Chat method is used only when Json capability is present because I had to rush this new chat API in order to enable structured outputs. Otherwise there was no way to pass the schema. WIP...
Thank you!
@geoand what do you suggest to do regarding the implementation of this functionality in quarkus-langchain4j
? I could go ahead and implement what is there today, or wait for a new release.
I could go ahead and implement what is there today
You can go ahead and do that here and when it feature langs in LangChain4j we can utilize it
I've been thinking about it and I am considering using tools (function calling) instead of JSON mode when return type is POJO and Structured Outputs feature is not supported (e.g. when LLM provider is not OpenAI or Gemini):
This is how it can work:
if (isStructuredOutputType(methodReturnType)) { // e.g. POJO, enum, List<T>/Set<T>, etc.
if (chatModel.supportedCapabilities().contains(RESPONSE_FORMAT_JSON_SCHEMA)) {
// Proceed with generating JSON schema and passing it to the model using structured outputs feature.
// This will work for OpenAI and Gemini.
} else if (chatModel.supportedCapabilities().contains(TOOLS)) {
// Create synthetic tool "answer" and generate JSON schema for it
if (configuredTools.isEmpty()) {
// The "answer" is the only tool, so we will *force* the model to call this tool using tool_mode LLM parameter (will be available in the new ChatModel API)
} else {
// There are other tools that user has configured. It means that LLM could/should use one or multiple of them before providing the final answer.
// I am not sure yet what is the best solution in this case. For example, we could add "final_answer" to the list of tools and hope that LLM will use it to provide the answer.
// We could also append a hint to the prompt (e.g. "Use final_answer tool to provide a final answer").
// Or we could call the LLM in the loop (if LLM decides to call tools) untill it returns a final answer in plain text, and then call it again only with "answer" tool available and force it to call it with tool_mode parameter.
// There can be multiple strategies and we could make this configurable for the user.
// Please note that this is probably pretty rate use case (when user needs both structured outputs and tools).
}
} else {
// Fallback to appending "You must answer strictly in the following format..." to the prompt.
}
}
WDYT?
We can also make "what to use to get structured output from LLM" as a configurable strategy that user can specify explicitly (e.g. USE_STRUCTURED_OUTPUTS
, USE_TOOLS
, USE_JSON_MODE
, USE_PROMPTING
, etc.)
I've been thinking about it and I am considering using tools (function calling) instead of JSON mode when return type is POJO and Structured Outputs feature is not supported
I am not entirely sure how the tools would enforce valid JSON generation from the LLM in this context. Is the primary role of the tools to generate the schema, or is it used to handle response formatting after the model has generated the output?
- Tools are supported by 14 LLM providers, JSON mode only by 7: https://docs.langchain4j.dev/integrations/language-models/
I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS)
could introduce issues when certain models do not fully support tools.
I've been thinking about it and I am considering using tools (function calling) instead of JSON mode
Regarding the JSON mode, I think that combined with the "You must answer strictly in the following format..."
can give good results even for "small" models. Today, models tend to return the JSON structure given in the prompt, but this does not mean that the desired structure is 100% present.
We can also make "what to use to get structured output from LLM" as a configurable strategy that user can specify explicitly (e.g.
USE_STRUCTURED_OUTPUTS
,USE_TOOLS
,USE_JSON_MODE
,USE_PROMPTING
, etc.)
I agree. Having this as a configurable option is ideal from my perspective. However, we should be careful with USE_TOOLS
.
If the provider returns an error when trying to use a model that does not support tools, the consideration I made about using USE_TOOLS
can be ignored.
I am not entirely sure how the tools would enforce valid JSON generation from the LLM in this context. Is the primary role of the tools to generate the schema, or is it used to handle response formatting after the model has generated the output?
When LLM support tools, you can provide a JSON schema and LLM will generate a valid JSON that follows the schema (in like 95% of cases ,depending on the complexity of schema). LC4j generates JSON schema from a @Tool
-annotated method parameters automatically. Most modern LLMs are explicitly trained for "tool calling" use case to produce a valid JSON that follows the provided schema. Does this answer your Q?
I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS) could introduce issues when certain models do not fully support tools.
Good point, this is why we should make this behavior configurable.
Regarding the JSON mode, I think that combined with the "You must answer strictly in the following format..." can give good results even for "small" models. Today, models tend to return the JSON structure given in the prompt, but this does not mean that the desired structure is 100% present.
I agree that JSON mode works pretty good, but tools are more reliable than JSON mode by design. JSON mode feature just "guarantees" (in 95% of the times) that the returned text is a valid JSON. One can provide JSON schema in the free-form in the prompt, but there is no guarantee that LLM will follow it. Tools, on the other hand, "guarantee" (again, 95%) that returned text is not only a valid JSON, but also follows the specified schema. And in this case schema is specified in a standartized way (as a separate LLM request parameter) and not appended as free-form text to the user message. Since tool-calling LLMs are tuned to follow the schema, and there is only a single way to specify the schema, it is mroe reliable than appending schema as a free-form text to the user prompt.
I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS) could introduce issues when certain models do not fully support tools.
If the provider returns an error when trying to use a model that does not support tools, the consideration I made about using USE_TOOLS can be ignored.
Good point! I guess this concern is applicable mostly for Ollama, as all other LLM providers that support tools, usually support them for all their models (at least I see this trend lately). Ollama throws an error in case tools are not supported by specific model: {"error":"tinydolphin does not support tools"}
, so we should be safe.
When LLM support tools, you can provide a JSON schema and LLM will generate a valid JSON that follows the schema (in like 95% of cases ,depending on the complexity of schema). LC4j generates JSON schema from a
@Tool
-annotated method parameters automatically. Most modern LLMs are explicitly trained for "tool calling" use case to produce a valid JSON that follows the provided schema. Does this answer your Q?
In part, I want to understand the actual use of tools to solve this problem. I have something in mind, but I don't know if we're on the same page. Suppose I have an LLM that needs to extract some user info, and this is the output pojo:
record User(String firstName, String lastName) {}
Your idea is to have a tool method like this to generate the correct JSON?
@Tool("Generates a response in the required JSON format.")
public User answer(String firstName, String lastName) {
return new User(firstName, lastName);
}
Good point! I guess this concern is applicable mostly for Ollama, as all other LLM providers that support tools, usually support them for all their models (at least I see this trend lately). Ollama throws an error in case tools are not supported by specific model:
{"error":"tinydolphin does not support tools"}
, so we should be safe.
:+1:
@andreadimaio no, the idea is to automatically create
ToolSpecification.builder()
.name("answer")
.addParameter("firstName")
.addParameter("lastName")
.build()
under the hood of AI Service and inject it in the request to the LLM and force it to use this tool.
In this case user does not have to do anything, LLM will be forced to reply by calling answer
tool and provide a valid JSON which we will deserialize into User
object and return to the user.
Python version of LC is actually using tool calling as a primary way for structured outputs: https://python.langchain.com/docs/how_to/structured_output/#the-with_structured_output-method
In this case tools are kind of "misused" for returning structured output
@andreadimaio no, the idea is to automatically create
ToolSpecification.builder() .name("answer") .addParameter("firstName") .addParameter("lastName") .build()
under the hood of AI Service and inject it in the request to the LLM and force it to use this tool.
In this case user does not have to do anything, LLM will be forced to reply by calling
answer
tool and provide a valid JSON which we will deserialize intoUser
object and return to the user.
Yes, it was just an example, of course everything will be automatic. So we are on the same page :) Now it is clear to me, thanks!
Just to document this explicitly, here is the order (from best to worst) of strategies to get structured outputs is AI services (this is not implemented yet, jsut a plan):
User should be able to override this logic and explicitly specify which strategy to use.
cc @glaforge @jdubois @agoncal
The
ChatLanguageModel
interface provides a new method that can be implemented to force the use ofresponse_format
tojson
when an AiService method returns a pojo. This is something that can be done automatically by quarkus.This should be a simple change to the
AiServiceMethodImplementationSupport
class, but all current providers will need to be updated to manage this new method.Does this make sense?