mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.63k stars 10k forks source link

Text selection does not return the correct characters #2769

Closed xavier114fch closed 8 years ago

xavier114fch commented 11 years ago

http://www.ourfuturerailway.hk/doc/RDS2U_PE2_consultation_document_Chi.pdf

This is a Chinese PDF file. Go to page 4 of the file and you will get a series of paragraph labelled 1.1, 1.2 etc. Now select the first few characters after 1.1: "在2011年3月" and right click for the context menu. You will noticed that the text shown for Search Google becomes 'Search Google for " 年3月,"'. Copy and paste the selection also yields " 年3月,".

The correct characters are selected using Acrobat XI 11.0.2.

One more observation is that, in PDF.JS, selecting both "1.1 | 在2011年3月" shows double highlighting for the space character after "|" and under "在2011".

waddlesplash commented 11 years ago

I've noticed double-selection a few times on English PDFs generated by GhostScript (and not on an identical PDF generated from the same file using OpenOffice PDF instead). However, this PDF is created by InDesign...

timvandermeij commented 9 years ago

The link to the PDF is dead. Do you have a new link, @xavier114fch ?

xavier114fch commented 9 years ago

This is a similar document from another source. http://www.legco.gov.hk/yr12-13/chinese/panels/tp/tp_rdp/papers/tp_rdp0301cb1-595-3-c.pdf

The original text selection problem appears on Page 7 of the above PDF, but it seems the issue could not be reproduced by using the latest PDF.js version.

timvandermeij commented 9 years ago

The text selection is still a bit off for me, so we'll leave this open. Thank you for adding a new link!

timvandermeij commented 8 years ago

Text selection is now fixed after recent patches, so I'm closing this as fixed.