Open OnceCrazyer opened 1 month ago
You need to provide your own implementations of DocumentReader
.
TextReader(Resource resource)
takes the spring resource, so that could handle byte[] and URLs. Let me noe @OnceCrazyer if that works for your use case.
JSON Text Markdown PDF Page PDF Paragraph Tika (DOCX, PPTX, HTML…)
Can ETL Pipeline support more ways to read the file content, such as byte [], http url, or the file upload MultipartFile can directly read the file text content?