mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.33k stars 9.97k forks source link

PDF text missing #2884

Closed vyv03354 closed 6 years ago

vyv03354 commented 11 years ago

http://www.chotatsu.e-aichi.jp/portal/pdf/1/4.6.1.pdf This is broken by 78213e8. Why is this heuristic added?

vyv03354 commented 11 years ago

Hm, #1597.

yurydelendik commented 11 years ago

Type1 CMap parser needs to be refactored to avoid this heuristic.

timvandermeij commented 11 years ago

Still broken on Windows 7 x64, Firefox 22 (HWA on) and the latest PDF.js development build.

Snuffleupagus commented 11 years ago

This will be fixed partially by PR #3674.

Snuffleupagus commented 10 years ago

With #4259 all glyphs are now rendered, but some of them are placed on top of each other.

timvandermeij commented 10 years ago

That, and some glyphs have more space between them than they should have. Compare the first lines in Acrobat/PDF.js for example.

timvandermeij commented 9 years ago

Spacing issues and glyps on top of each other seem to be resolved, but some glyps are still not correct, espcially in the header we miss the digits and throughout the document there are some vertical bars.

brendandahl commented 8 years ago

Looks to be the same issue as https://github.com/mozilla/pdf.js/issues/6397#issuecomment-164954332

Snuffleupagus commented 7 years ago

It seems that this is almost fixed by PR #8580; please see the comparison between Adobe Reader (on the left) and PDF.js (on the right):

2884