spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.35k stars 866 forks source link

Tika Document Reader issue (OpenAI and Mistral) #1476

Open kursatufukcoskun opened 1 month ago

kursatufukcoskun commented 1 month ago

Bug description Tika Document Reader dependency causing response type exception.

Environment Spring Boot version: 3.3.4 Spring AI Version: 1.0.0-M2 Java Version: OpenJDK 22

Steps to reproduce Add tika dependency to the pom.xml

<dependency>
     <groupId>org.springframework.ai</groupId>
     <artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>

Call OpenAI chat model:


PromptTemplate promptTemplate = new PromptTemplate(getCapitalPrompt);
Prompt prompt = promptTemplate.create(Map.of("stateOrCountry", getCapitalRequest.stateOrCountry(),"format"  , format));
ChatResponse response = chatModel.call(prompt);
return parser.parse(response.getResult().getOutput().getContent());

Expected behavior It should response as expected but when Tika dependency is in my project, cause response type exception.

Exception

Error while extracting response for type [org.springframework.ai.openai.api.OpenAiApi$ChatCompletion] and content type [application/json]
    at com.patikaglobal.auditnow.ai.service.generator.AIDelegateService.generateAiResponse(AIDelegateService.kt:62)
    at com.patikaglobal.auditnow.ai.controller.AIController.generate(AIController.kt:22)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at kotlin.reflect.jvm.internal.calls.CallerImpl$Method.callMethod(CallerImpl.kt:97)
    at kotlin.reflect.jvm.internal.calls.CallerImpl$Method$Instance.call(CallerImpl.kt:113)
    at kotlin.reflect.jvm.internal.KCallableImpl.callDefaultMethod$kotlin_reflection(KCallableImpl.kt:207)
    at kotlin.reflect.jvm.internal.KCallableImpl.callBy(KCallableImpl.kt:112)
    at org.springframework.web.method.support.InvocableHandlerMethod$KotlinDelegate.invokeFunction(InvocableHandlerMethod.java:334)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:252)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:926)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:831)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:195)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:164)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:140)
    at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:108)

By the way, this doesn't only happen with OpenAI. I tried it with Mistral, and it has the same problem. I didn't have this problem when I tried it with Bedrock. However, with Bedrock, the response is text, instead of json. When I remove the Tika dependency, the response turns into json again.

ThomasVitale commented 1 month ago

@kursatufukcoskun thanks for reporting this issue. I tried to reproduce the error, but couldn't do it. Perhaps could you share some more details about your setup or a small app example that reproduces the problem? Also, could you try and see if the problem is still there when using version 1.0.0-SNAPSHOT? Thanks!

csisanyi commented 1 month ago

I have encountered the same problem Spring AI Version: 1.0.0-M3 source of the error seems to be somewhere in tika-parsers-standard-package (excluding it solves the error, but of course the document parsing is not happening)

the following log might also be useful, this is the base of the originally mentioned error

Caused by: org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (org.springframework.util.StreamUtils$NonClosingInputStream); line: 1, column: 1]) at org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.readJavaType(AbstractJackson2HttpMessageConverter.java:406) at org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.read(AbstractJackson2HttpMessageConverter.java:354) at org.springframework.web.client.DefaultRestClient.readWithMessageConverters(DefaultRestClient.java:213) ... 179 common frames omitted

Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (org.springframework.util.StreamUtils$NonClosingInputStream); line: 1, column: 1]) at [Source: (org.springframework.util.StreamUtils$NonClosingInputStream); line: 1, column: 2] at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:697) at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:512) at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:529) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3103) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:757) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:181) at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2105) at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1481) at org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.readJavaType(AbstractJackson2HttpMessageConverter.java:395) ... 181 common frames omitted

csisanyi commented 1 month ago

@kalcifield further debugged the issue and using this dependency solves the error and communication with OpenAI is OK

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-tika-document-reader</artifactId>
    <exclusions>
        <exclusion>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parser-pkg-module</artifactId>
        </exclusion>
    </exclusions>
</dependency>

source of the bug seems to be in org.brotli:dec:jar:0.1.2:compile `

ilayaperumalg commented 3 weeks ago

Hi @csisanyi Thanks for the investigation and suggested fix.

@csisanyi @kursatufukcoskun I am trying to reproduce the issue and would like to know what parser do you use in the above code snippet which introduced the conflict here. A sample repo which demonstrates this issue would be helpful. Thanks!

csisanyi commented 2 weeks ago

@ilayaperumalg I'm not sure about the specific parser that is used, but regardless of the file format, I can always reproduce the error by trying to add the parsed material to the OpenAI client request.

ilayaperumalg commented 2 weeks ago

@csisanyi Thanks for the response. It would be helpful if you can share a sample that reproduces the issue.

markpollack commented 2 weeks ago

From this report, we dont know what is going on inside parser.parse so can't reproduce. It could be that different models return differnt repsonses, and that one response is not parsable. Please provide a more comprehensive example to reproduce.