mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.17k stars 9.82k forks source link

[Regression] Text-selection boxes visually overlap after PR 17533 #17561

Open Snuffleupagus opened 5 months ago

Snuffleupagus commented 5 months ago

Attach (recommended) or Link to PDF file here: https://github.com/mozilla/pdf.js/blob/master/test/pdfs/pdf.pdf.link

(This can be reproduce even with tracemonkey.pdf, but it's less pronounced since the lines don't overlap as much.)

Configuration:

Steps to reproduce the problem:

  1. Select text, and look at the intersection between different text-selection boxes.

What is the expected behavior? (add screenshot) The intersection should not be visible, as previously:

old

What went wrong? (add screenshot) The intersection is visible:

new

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): N/A

/cc @calixteman

marco-c commented 5 months ago

This has not made its way to Firefox yet, right?

Snuffleupagus commented 5 months ago

This has not made its way to Firefox yet, right?

Correct.

calixteman commented 5 months ago

I think this regression is acceptable with regard to the improvement we've in HCM and it helps to see the caret to make highlighting possible with the keyboard. It's a visual annoyance, but I'd say that it doesn't avoid to use text selection and I checked few pdfs and in general there are no issues, so maybe this case is just a corner case. That said, as far as I know there is no way to fix it without reverting the regressor patch BUT the height of the selection depends of the font, its size, the line height, ... which aren't correct because the font we use (serif/sans serif) is very likely not the one we've in the pdf. So the correct fix should be to use the font from the pdf in the text layer (I still have a WIP locally to do that).

timvandermeij commented 5 months ago

Given the above, should we still mark this as a release blocker? I'm planning to make a new release tomorrow, and if this does block a new release I'll postpone that action.

Edit: Given the other release blocker ticket, I'll postpone the release with a week anyway and then see where we are.

Snuffleupagus commented 5 months ago

I think this regression is acceptable with regard to the improvement we've in HCM and it helps to see the caret to make highlighting possible with the keyboard.

I don't really agree with that assessment, since we have potential regressions for the majority of users this way.

It's a visual annoyance, but I'd say that it doesn't avoid to use text selection and I checked few pdfs and in general there are no issues, so maybe this case is just a corner case.

Unfortunately it can be a lot worse than just a minor annoyance in some cases, please see an example further below.

That said, as far as I know there is no way to fix it without reverting the regressor patch

Would it be feasible to only apply the new styles in HCM, to avoid affecting "normal" browsing?

So the correct fix should be to use the font from the pdf in the text layer (I still have a WIP locally to do that).

The problem is that it still won't fix these types of issues generally, since glyphs are often positioned absolutely and that means that we'll never get rid of all overlapping text-selection boxes. The example that I used in this issue contained vertical overlap, but it could just as well be horizontal overlap which can look a lot worse.

The following is an especially problematical example, since it uses Type3 fonts, but it does illustrate (much better) why we "need" to do something about it: https://github.com/mozilla/pdf.js/files/1274923/SMPembedded.2006.10.09b.pdf

overlap

calixteman commented 5 months ago

In general we can't ship something which isn't accessible (it used to be the opposite but nowadays it's the policy), so we need to make highlighting keyboard accessible if we want to ship it. The "annoyance" you mentioned here is less important for us than fixing a11y issues. About the pdf you mentioned in your comment: it was already awful (in term of text selection) and I fully agree that now it's worse but between "awful" and "slightly more awful" and I don't see that much difference, so I still consider that it's a minor annoyance in general. That said, I usually try to improve things instead of making them worse, so I've a WIP locally to draw the selection in using the svg layer we recently added , for example: image and it'll fix this bug and it'll let us really improve text selection in HCM (in using a filter) which is, right now, awful. In term of performance, I didn't see any noticeable regressions.

calixteman commented 5 months ago

And for your information, I have a potential fix for the selection issue in Chrome. I need to think about that globally but I've the feeling it should be possible to fix most of the selection issues we've.

marco-c commented 5 months ago

I think this regression is acceptable with regard to the improvement we've in HCM and it helps to see the caret to make highlighting possible with the keyboard.

I don't really agree with that assessment, since we have potential regressions for the majority of users this way.

I agree we should avoid shipping this regression to users. I have filed https://bugzilla.mozilla.org/show_bug.cgi?id=1878258 to track this regression so that we make sure it is fixed before we release Firefox 124 (which currently includes the regression).

That said, we are considering the regression less severe than the improvement because while it's true that the regression affects a larger number of users, the improvement has a huge impact for users that currently are completely unable to select text and/or highlight text in the reader.

timvandermeij commented 5 months ago

I agree we should avoid shipping this regression to users. I have filed https://bugzilla.mozilla.org/show_bug.cgi?id=1878258 to track this regression so that we make sure it is fixed before we release Firefox 124 (which currently includes the regression).

Thank you for the feedback. Given this information I'll hold off on releasing a new PDF.js version until this is fixed so there is enough time and we avoid any bug reports about this.

timvandermeij commented 3 months ago

Given that this has been a release blocker for a long time now and the last PDF.js release was in December 2023, the question was raised in https://github.com/mozilla/pdf.js/pull/17880#issuecomment-2034930669 if we should go ahead and create a release now and just put a note in the release notes about this bug? I agree with this because we have had many features and bugfixes land since that users would like to have, and if we wait longer the release will only get bigger while we would like to keep releases small (as was the purpose of the monthly release cycle).

@calixteman @marco-c What do you think about this?

marco-c commented 3 months ago

@calixteman if it's a quick and easy fix, I'd say let's wait on that. If it's a more cumbersome fix, I'd agree with shipping a version with the regression included.

timvandermeij commented 2 months ago

@calixteman What do you think of the above? If it's desired I can make a new release today or tomorrow, for the reasons mentioned above and so we can land some other PRs that would be safer to merge once a new release is made.

calixteman commented 2 months ago

Yes please make a release. I'll try to land a patch in the next weeks.

timvandermeij commented 2 months ago

I have just completed the release of PDF.js version 4.1.392, with this bug referenced as a known issue in the release notes at https://github.com/mozilla/pdf.js/releases/tag/v4.1.392. Landing of new patches should be unblocked now.

@calixteman The bot sadly doesn't seem to have pushed the release to NPM. Could you check the logs and push it manually so NPM is also in sync again (see https://github.com/mozilla/botio-files-pdfjs/blob/master/on_release.js#L21 for how the bots do it)? (This prompted me to look into porting the NPM part to GitHub Actions as part of #11851 again, and I think I now finally have an idea on how we can prepare and test this without actually having to push something to NPM.)

calixteman commented 2 months ago

It was a matter of cached ssh key. It should fine now: https://www.npmjs.com/package/pdfjs-dist

timvandermeij commented 1 month ago

I have removed the release-blocker label because we're already tracking this with the regression label and as high priority on the project board, plus we've had several releases by now and haven't received new complaints about this as far as I have seen (so hopefully for most documents the change isn't really visible). We should still fix this, but it doesn't seem severe enough anymore to warrant the release-blocker label or separate inclusion in the release notes.