Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
On a debian install, I get a lot of errors when indexing pdfs indicating JPEG2000 images cannot be read.
java[1179714]: ERROR [qtp1335914322-72] 14:16:52,117 org.apache.pdfbox.contentstream.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
I put libraries in /opt/solr-8.11.1/contrib/extraction/lib
jai-imageio-core-1.4.0.jar jai-imageio-jpeg2000-1.4.0.jar
but still have the errors. Has anyone had success getting the JPEG2000 images read in pdfs during extraction?