Closed jamesvillarrubia closed 4 months ago
@ansukla As expected, there are some critical fixes in this version that I missed in the first, if you get a sec to a approve. Cheers!
FWIW, I've been testing some PDFs that are broken under current main branch. Three of them are working great with these changes.
Can this get published?
Very nice, if somebody can provide an example of a docker build command that will build from the main branch, so that I can test, that would be wonderful
Hi @jamesvillarrubia,
I hope you're doing well! I wanted to ask if the new v2 contains any changes other than those listed in the comparison you posted here. It seems unclear what the changes are inside the v2 Jar. I plan to make some RTL fixes over NLM 2.9.2 and would appreciate your confirmation. Also, what is the Apache Tika version or commit you have used?
Thank you!
Description of the change
X-Tika-PDFOcrStrategy
andX-Tika-PDFExtractFontNames
have been added to the TikaFileParser for better control of the OCR and PDF parsing behavior.<p>
tags. New jar resolves this bug.Type of change
Testing
The changes have been tested manually to ensure the Tika server works correctly with the new configuration and headers. Additional testing is required for edge cases.