tleyden / open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker
Apache License 2.0
1.33k stars 224 forks source link

feature request: howto add custom preprocessor #17

Open evantill opened 9 years ago

evantill commented 9 years ago

I would like to add custom preprocessors (like croppping, rotation of page, splitting pdf with multiple pages....)

Have you some HOWTO documentation on adding custom preprocessors ?

What come to my mind would be:

This way we could create custom preprocessor in any language (shell, go, ruby...)

tleyden commented 9 years ago

@evantill there's an example using Stroke Width Transform, which uses a docker container. The docs are still lacking though .. here's what is currently available:

https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform

If you got that running first, it might all make sense and you could move forward with implementing your own pre-processor.

Let me know where you get stuck and I'll try to improve the docs.

evantill commented 9 years ago

@tleyden in the documentation you are running a new open-ocr-preprocessorusing

$ docker run -d tleyden5iwx/open-ocr-preprocessor open-ocr-preprocessor -amqp_uri "${AMQP_URI}" -preprocessor "stroke-width-transform"

which is exactly what I would like to use for my custom preprocessor. But reading the source code it seems that the preprocessor seems hard coded

Actually the open-ocr-preprocessor docker image is based on tleyden5iwx/stroke-width-transform. What I would like is an open-ocr-preprocessor genric image and a specialized preprocessor image for stroke-width-transform based on it.

tleyden commented 9 years ago

@evantill yeah, I just took a look, and I see what you mean. There does not seem to be clear path to plugin a custom preprocessor in the chain. I think it's a pretty shallow and easy to fix issue, because it was architected to be able to easily setup a "preprocessor pipeline". I'll have to re-read the code and get back to you on this.

evantill commented 9 years ago

:+1:

tleyden commented 9 years ago

@evantill I made some progress on this .. still untested, but if you want to see the in-development stuff, check out the feature/generic_preprocessor branch

evantill commented 9 years ago

nice work. Go is still new for me so I need more time to ramp up in go language.