slavabarkov / tidy

Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
GNU General Public License v3.0
134 stars 13 forks source link

Text recognition #9

Open test2a opened 8 months ago

test2a commented 8 months ago

Memes and photos have text overlays. Usually the file name is not enough to find the right photos

Would it be possible to recognize text and index that?

slavabarkov commented 8 months ago

@test2a In theory it should already be working reasonably well with digitally rendered text, and from my experience actually works quite good on my device for that task. Since the used CLIP model is trained on a bunch of image/text pairs scraped from the internet, it has some semantic OCR representation capabilities already and responds to the presence of text pretty well. Obviously the separate OCR model would be even better, so I'll probably consider adding that if I decide to implement it in the future.

sowa705 commented 8 months ago

Hi, clip works well for short text snippets but fails for different languages or longer text. I think it might be a good idea to introduce multiple "sources" for similarity like clip, ocr and potentially others. Might be good to keep in mind for the future

waqqas31 commented 3 months ago

Hello @slavabarkov

I was directed to your app when I reported that my Samsung Gallery app was no longer performing text searches on images taken and/or stored on the phone.

My primary use case is searching for text, and I had some feedback for you.

  1. When searching for terms that seem to have no exact matches, the result set is "scattered" with lots of blank results in between actual pictures.
  2. Multidigit numbers are treated as separate single-digit numbers. E.g. "786" will return all results that include a "7", "8" and a "6", but not necessarily together.
  3. Results do not seem to be sorted from the best matches (to the worst.) Exact matches are scattered between partial matches.
  4. It would be really helpful to support exact matches only (using quotation marks.)
  5. It would be helpful to have a "Refresh index" option within the app, instead of having to kill the app and relaunch it.
  6. If you can implement an OCR function to scan all text in all pictures, that would be EXTREMELY valuable.
  7. When we open a picture, if we can see the path and filename, that will help us understand if the search term matched the picture or part of the metadata.
  8. If next to the "Share" button you can add a button to open the picture with the default gallery app, that would be very useful, too.

That's all my feedback for now.

Thanks for all your hard work!