houqp / leptess

Productive and safe Rust binding for leptonica and tesseract
https://houqp.github.io/leptess/leptess/index.html
MIT License
258 stars 29 forks source link

get_alto_text, get_tsv_text, get_lstm_box_text and get_word_str_box_text #29

Closed Gearme closed 3 years ago

Gearme commented 3 years ago

I've implemented get_alto_text, get_tsv_text, get_lstm_box_text and get_word_str_box_text, they work pretty much like get_hocr_text. Also implemented tests for them, adding regex to dev-dependencies for testing of output formats.

ccouzens commented 3 years ago

Thank you for your contribution Gearme.

Looks good to me, and I'm happy to merge in principle.

I'm looking into the CI failure. I suspect the version of tesseract used in the ci is too old to have the 4 functions in the c API (eg TessBaseAPIGetAltoText).

ccouzens commented 3 years ago

@Gearme , I hope to get https://github.com/houqp/leptess/pull/30 merged to fix your CI issue. Once that's merged, I'll look at getting this merged.

ccouzens commented 3 years ago

Hi @Gearme , I've merged #30. When you have a moment could you please either merge with the master branch or rebase against it.

If you need help, I may be able to do it for you.

Gearme commented 3 years ago

I apologize for my tardiness - the changes have been merged now.

One side note: Locally, I've bumped the dependencies to leptonica-sys and tesseract-sys to their latest and they work fine. Didn't inlude them in this merge request though, since they're not required.

ccouzens commented 3 years ago

All merged and package @Gearme

0.11.0 should be good to go for you https://crates.io/crates/leptess/0.11.0

Thanks for the PR :)

houqp commented 3 years ago

Thanks @Gearme for the new methods and tests indeed!