scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
349 stars 19 forks source link

[BUG] Does not support Simplified Chinese. #30

Closed yongnianliu closed 1 year ago

yongnianliu commented 1 year ago

input image image

OCR result(Extract Text to clipboard) "E a 10F 1 招 HIERR a65 2 zz t 仁 3 RtSI0FETEN E 4 PEIHRSGRSKah E B s 沥"

my settings. image

image

Problem description: Adding an image containing Simplified Chinese resulted in text recognition that does not match the content of the image. The recognized content appears more like garbled characters.

Your environment: Windows 11 Professional x64 22H2

yongnianliu commented 1 year ago

This is a fantastic plugin, I hope it can support Simplified Chinese.

scambier commented 1 year ago

Probably the same issue as #1. Is the behavior different if you remove "eng" from the list of OCR Languages, clear the cache, and re-extract?

yongnianliu commented 1 year ago

Thank you for your reply. I tried to set only Simplified Chinese and cleared the cache, but the problem still exists. It seems that it is not the same issue as #1 .

Onkitova commented 1 year ago

@yongnianliu try this:

  1. Go to

your_vault_folder\.obsidian\plugins\text-extractor

  1. Open data.json file and edit it to look like this image
  2. Open Obsidian, clear cache of Text Extractor and then try to OCR-something into text note.

Solution above seems to work for me with similar issue https://github.com/scambier/obsidian-text-extractor/issues/1.

And please, share a feedback over here if such temporal solution also works for your eng+chi_sim language combination as it works for eng+rus.

scambier commented 1 year ago

@yongnianliu the workaround mentioned by Onkitova has been fixed in the latest release. Could you update the plugin and report if this fixes your issue? Thanks :)

yongnianliu commented 1 year ago

Yes,It works.Sorry for the late reply. Thanks@scambier @Onkitova