spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.3k stars 844 forks source link

Can ETL Pipeline support more ways to read the file content #1397

Open OnceCrazyer opened 1 month ago

OnceCrazyer commented 1 month ago

JSON Text Markdown PDF Page PDF Paragraph Tika (DOCX, PPTX, HTML…​)

Can ETL Pipeline support more ways to read the file content, such as byte [], http url, or the file upload MultipartFile can directly read the file text content?

alexcheng1982 commented 1 month ago

You need to provide your own implementations of DocumentReader.

markpollack commented 1 month ago

TextReader(Resource resource) takes the spring resource, so that could handle byte[] and URLs. Let me noe @OnceCrazyer if that works for your use case.