Closed aldemira closed 2 years ago
I've checked and works fine. So, provide a sample PDF to test.
OK, let's do this, I can't freshly install 6.3.12 now so I'll be closing this issue, whenever I can. I'll install a fresh copy and test it. Thanks.
Sorry I've to reopen this issue now. I've just installed 6.3.12 from scratch (with docker-compose). And here are the logs I'm getting:
2022-09-27 11:25:00,105 [Thread-181] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=d5a22248-29ae-4d42-aadc-551b810049e4, docPath=/okm:root/Video/intro-linux.pdf, docVerUuid=22de99e1-cb51-42e4-a67f-ff3da8064686, date=Tue Sep 27 11:22:47 UTC 2022} 2022-09-27 11:25:00,854 [Thread-181] WARN c.o.extractor.CuneiformTextExtractor - Undefined OCR application 2022-09-27 11:25:00,855 [Thread-181] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/Video/intro-linux.pdf': Too few text extracted 2022-09-27 11:30:00,067 [Thread-208] INFO com.openkm.core.UserMailImporter - User mail importer activated 2022-09-27 11:30:00,085 [Thread-209] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=8f5e2b68-cbd1-45b6-a5f9-68fa46855fce, docPath=/okm:root/14F-Intro to Python-3.3.pdf, docVerUuid=e08e3c13-43ab-449d-9b9a-1a3fa891f6ed, date=Tue Sep 27 11:27:57 UTC 2022} 2022-09-27 11:30:00,088 [Thread-209] WARN c.o.extractor.CuneiformTextExtractor - Undefined OCR application 2022-09-27 11:30:00,089 [Thread-209] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/14F-Intro to Python-3.3.pdf': Too few text extracted
The files I've tested are:
https://www.tug.ca/tec/Sessions/Handouts/PDF/14F-Intro%20to%20Python-3.3.pdf https://tldp.org/LDP/intro-linux/intro-linux.pdf
6.3.9 doesn't have this problem.
I kinda feel ashamed but I think I forgot to delete the local volume (tomcat) which was the issue this time. So reinstalled again and now search and text extraction works. Sorry for spamming your inbox (yet again)
Anyway, if you have these kind of problems again, check the list of text extractor because you may have collisions.
Best regards.
I just reverted back to 6.3.9 and it works flawlessly. I tried rebuilding indexes etc. But I see errors that text etraction had failed. Hence the search doesn't produce anything at all. 6.3.9 works fine.