houqp / leptess

Productive and safe Rust binding for leptonica and tesseract
https://houqp.github.io/leptess/leptess/index.html
MIT License
258 stars 29 forks source link

[REQ] setvariable eng.user-words #31

Closed DartMen closed 1 year ago

DartMen commented 3 years ago

A way to feed a list of words to supplement Tesseract would be very nice.

In a .NET implementation this seems possible by setting: Tess.setVariable("user_words_suffix", "user-words");

Also consider adding a test method to verify user supplied words are loaded and actually used by Tesseract.

ccouzens commented 3 years ago

It sounds like we should be adding a way to call set variable.

From there, you could use "user_words_suffix", "user-words" or any of the other variables

ccouzens commented 3 years ago

I thought I had commented on this yesterday. I'll re-comment as best as I remember.


To start, I'm going to try and generate methods for every single possible variable.

The list of variables can be found by running tesseract --print-parameters. Each variable is separated by a new line. Within a line, it's name tab example tab comment.

This will allow each variable to have a doc-comment generated for it. It will also allow us to forbid calling setVariable with an invalid variable name- in my observations each time an invalid variable name is used Tesseract leaks a small number of bytes of memory.

Unfortunately I cannot see how to determine which variables are numbers and which are strings.

If this proves too difficult, I'll fall back to just exposing set_variable and be done with it.

houqp commented 3 years ago

Instead of generating methods, how about generating an enum? Then we can have a safe set_variable wrapper method that takes the enum as param argument and value as string just like what set_variable does. This avoids the need to detect value type.

ccouzens commented 3 years ago

0.13.0 has support for set_variable https://crates.io/crates/leptess/0.13.0

yosefahab commented 1 year ago

0.13.0 has support for set_variable https://crates.io/crates/leptess/0.13.0

is this through TessApi::raw::set_variable() ? which requires the use of a CStr ?

ccouzens commented 1 year ago

0.13.0 has support for set_variable https://crates.io/crates/leptess/0.13.0

is this through TessApi::raw::set_variable() ? which requires the use of a CStr ?

Hi, no.

Try LepTess::set_variable https://houqp.github.io/leptess/leptess/struct.LepTess.html#method.set_variable

That takes an enum https://houqp.github.io/leptess/leptess/enum.Variable.html

yosefahab commented 1 year ago

is this through TessApi::raw::set_variable() ? which requires the use of a CStr ?

Hi, no.

Try LepTess::set_variable https://houqp.github.io/leptess/leptess/struct.LepTess.html#method.set_variable

That takes an enum https://houqp.github.io/leptess/leptess/enum.Variable.html

oh i see, i had used the LepTess struct before, but i felt a bit limited so i started using the low level api module.