koreader / koreader

An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
http://koreader.rocks/
GNU Affero General Public License v3.0
15.78k stars 1.22k forks source link

Chinese OCR doesn't work #11729

Open 719q opened 2 months ago

719q commented 2 months ago

Does your feature request involve difficulty completing a task? Please describe. Difficulty manually selecting more than one character to look up in dictionary, two characters are the most problematic. Similar issue was with Japanese in #4091, and solved in #8270.

Describe the solution you'd like Full Chinese language support similar to the Japanese one would be very welcome.

Describe alternatives you've considered StarDict dictionary

Additional context KOReader version: v2024.03.1 Device: Kindle PW4

719q commented 2 months ago

Looks like Chinese support works with epub files very well after installing a CC-CEDICT dictionary. It automatically selects multiple characters, but the problem with pdf still pertains. I updated my KOReader to v2024.04 which should fix #11715. I can't select characters with document language set to Chinese. I tried force OCR on, reflow on/off but nothing seems to fix it. Should I open a new issue? If so, please close this one. I use Tesseract 3.04 chi_sim trained data.