PDF text selection on iPad difficult and visually confusing

mattdricker commented 2 years ago

Bug report form

Reported by user: https://app.hubspot.com/contacts/6291320/ticket/1126006850

Steps to reproduce

Using an iPad, open a PDF Hypothesis reading assignment. Example here: https://hypothesis.instructure.com/courses/258/assignments/3609
Touch screen to start selecting text
Drag the text selector over the text and note how the iPad indicates the selected text in the UI (first screenshot). Also note that the Annotate icon in the adder displays as though there is no selected text, though if you tap to annotate you are brought to the sidebar to create a new annotation as usual. Additionally, note that the actual selected text is properly quoted, rather than the entire page as one might mistakenly think based on the earlier visual interface (screenshot 2)

Expected behaviour

Text selection visual UI should behave as expected without expanding to cover the entire document. Additionally, the Annotate icon should display as if there is selected text to annotate.

Browser/system information

Tested on iPad (8th generation), iPadOS 15.7. Tested with both the latest version of Canvas app, and opening assignment directly in Chrome browser.

lyzadanger commented 1 year ago

I can confirm that text selection on an iPad on PDFs in the browser with Hypothesis is terribly bad, nigh unusable, and that selecting and annotating text in other document types, e.g. HTML, is not delightful either. This is nothing new, but is becoming more problematic as more of our users want (reasonably) to use iPads to annotate content, and more of our content is PDF-based.

I’d like us to spend some time investigating these issues this coming cycle, with the understanding that there is no obvious low-effort solution, and there is a possibility of finding no short-term mitigation options. i.e. There is some risk that we won't be able to make this better, or at least not without significant endeavor.

What we can look at:

How much worse is the experience in PDFs versus HTML? Is it an order of magnitude worse, or is HTML annotation bad enough as well that we need to find a fix for more than just PDFs?
Can the PDFjs aspects of this be potentially lessened? When a user makes selections in PDFs, they are interacting with PDFjs’ generated text layer. It appears that the structure of that text layer — perhaps worse in documents with images and other non-flowing text in them —can cause that huge-block-of-blue selection weirdness that you can see in this issue’s description. In those cases it’s totally impossible to figure out what’s actually selected, and, what a mess! Investigate what selecting text in PDFjs documents on an iPad feels like when our application is not involved. See if there are bug reports or discussions about touch selection and PDFjs.
Could we solve this by adapting annotation controls significantly on touch devices? Or is it the selection itself that’s at the root of the problem?
Could we do something to adapt the selection once it’s made that would make this experience better, or otherwise something in our selection-handling code that could be tweaked?
Do we need to roll our own text-layer solution? For VitalSource we actually generate our own based on data we can get from them. This is no small feat, however.

Our goal in the next couple of weeks is to understand the problem and what our options are, with a goal of choosing one of those options.

lyzadanger commented 1 year ago

This isn't a great sign: https://github.com/mozilla/pdf.js/issues/15691

robertknight commented 1 year ago

I started poking at the possibility of re-using the custom image text layer (ImageTextLayer) that we created for VitalSource PDF books, in PDF.js as well. Compared to PDF.js's own text layer, our one has more structure to it, which can help guide the selection interface provided by the browser.

robertknight commented 1 year ago

I created https://github.com/hypothesis/client/issues/5201 as a sub-task for exploring the aforementioned ImageTextLayer-based solution.

robertknight commented 1 year ago

I moved this to the backlog as I am not actively working on it.

hypothesis / client