hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
3.29k stars 116 forks source link

[Feature Request] OCR search text in images #296

Open lethefrost opened 1 month ago

lethefrost commented 1 month ago

It would be especially helpful when you have a lot of screenshots, diagrams, photo of slides, etc., embedded in documents or as stand alone image files. Text in images may contain a large amount of information. However, it's not very easy to retrieve them in the traditional ways of file management. It would be greatly appreciated if you could consider making them searchable.

MohamedBassem commented 1 month ago

hmmm, OCR is a cool idea indeed. My only concern is finding a good OCR tool that would work with different languages.

lethefrost commented 1 month ago

hmmm, OCR is a cool idea indeed. My only concern is finding a good OCR tool that would work with different languages.

This might be helpful - I am thinking probably we can let each user configure a list of possible languages that would occur in their hoard - which usually are the languages they know, so the list wouldn't be too long (for most people it might be 1-3?). It seems that Tesseract.js supports recognizing multiple languages at the same time when you concatenate the lang codes with +?

MohamedBassem commented 1 month ago

tesseract.js looks cool indeed. We can probably add it to the roadmap at some point

akshara-tg commented 5 days ago

Without OCR (which allows for searching text within images), the hoarding images become somewhat pointless.