xelkano / redmine_xapian

This plugin allows searches across attachments with xapian search engine
GNU General Public License v2.0
54 stars 26 forks source link

Make an OCR function #134

Closed uhuntu closed 1 year ago

uhuntu commented 1 year ago

Hi there, this PR is an interesting idea to OCR the png/jpg files.

picman commented 1 year ago

I'm sorry but meanwhile I've modified the devel branch due too Robocop tests. Pull the updated devel branch and push your changes again.

uhuntu commented 1 year ago

Yes, I re-do it again, please help to review, thanks.

picman commented 1 year ago

As the final step, fix the errors in Rubocop tests and we can merge it :-)

picman commented 1 year ago

I was thinking for a while about it and found a better approach, I think. We can add a filter to omindex command to parse image files directly. E.g.: _extra/xapianindexer.rb

...
OMINDEX = "/usr/bin/omindex --filter=image/png:'tesseract %f -'"
...
 # unless File.exist?(OMINDEX)
 #   my_log "#{OMINDEX} does not exist, exiting...", true
 #   exit 1
 # end
...

PNG files are then indexed too.

An output on my computer. I have just one PNG with "Karel" in the image.

image

So, I suggest following this approach because:

  1. No extra cron job for converting images needed.
  2. No duplicite txt images.
  3. Support for whatever format, not just PNG or JPG.
  4. No changes in the code needed. (I'd just mention this feature in the README and remove the condition for testing the existance of omindex command.
uhuntu commented 1 year ago

Looks awesome, it is more elegant!

picman commented 1 year ago

I'm sorry for not accepting your pull request. I've just extended the documentation in #135. Thank you for your effort and contribution to the plugin.