Closed eyagarci closed 2 years ago
This is a duplicate of other past reports from other users. See label:non spaced words
The issue is that: 1) Tesseract produce a space between Chinese glyphs. 2) Different PDF viewers can present the same file differently.
Currently, there is no solution to this issue.
Hello, I made some Images recognition for chinese language. I found that the resulted text has different spacing between its characters with pdf.js. I use preserve_interword_spaces=1 to remove extra spaces but I did'nt find any difference.
I did some tests with other viewers like adobe acrobat reader and chrome. I found a difference between the results. Do you have any idea how to solve this problem with pdf.js.
Pdf.js:
Adobe acrobat reader:
Chrome:
Environment:
Tesseract: 4.0.0 Windows 10 (64 bit)