Open swissspidy opened 7 months ago
PDF.js can actually extract text from PDFs already. So might be more useful for images.
For images it could be interesting to extract text during upload and then store that as metadata. Useful for searching the media library.
Related: #647
PDF.js could be combined with https://tesseract.projectnaptha.com/ to do OCR on uploaded PDFs. Just needs a good use case.
Apparently the underlying Tesseract models haven't been updated in a while, so maybe need to find alternatives.