Open 4F2E4A2E opened 1 week ago
How can Tesseract recognize drop caps?
I am trying to train Tesseract to recognize drop caps in paragraphs. However, Tesseract v5 does not support multiline training. How can I achieve this?
Drop caps examples: https://support.microsoft.com/en-us/office/insert-a-drop-cap-817fd19f-40fe-4b73-95e8-f3c0f5e01278
drop caps data-set examples: drop_caps_data_set_example.zip
tesseract --version tesseract 5.5.0-1-g43b8d leptonica-1.85.1 libgif 5.1.9 : libjpeg 6b (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.4.0 Found NEON Found OpenMP 201511 Found libcurl/7.74.0 OpenSSL/1.1.1w zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0 librtmp/2.3
How can Tesseract recognize drop caps?
I am trying to train Tesseract to recognize drop caps in paragraphs. However, Tesseract v5 does not support multiline training. How can I achieve this?
Drop caps examples: https://support.microsoft.com/en-us/office/insert-a-drop-cap-817fd19f-40fe-4b73-95e8-f3c0f5e01278
drop caps data-set examples: drop_caps_data_set_example.zip