mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.36k stars 9.84k forks source link

[Bug]: Adding a `<base href>` tag to a page containing a pdfjs viewer breaks transparancy filters in Firefox. #18406

Closed Hypercubed closed 1 week ago

Hypercubed commented 2 weeks ago

Attach (recommended) or Link to PDF file

https://github.com/mozilla/pdf.js/files/11229069/test.pdf

Web browser and its version

Firefox 127.0.2

Operating system and its version

Mac and Windows

PDF.js version

4.5 (master)

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

  1. Running local copy of pdfjs using npx gulp server
  2. Add <base href="." /> to viewer.html
  3. Open test.pdf

What is the expected behavior?

The PDF should render the same regardless of <base href>

image

What went wrong?

When rendering a pdf in Firefox when a <base href> is defined the drop shadow is not rendered correctly.

image

Link to a viewer

No response

Additional context

This issue was discovered and discussed here: https://github.com/stephanrauh/ngx-extended-pdf-viewer/issues/2363

We tested accross multiple browsers, OSs, sites, and PDFs. The issue was eventually narrowed down to pdfjs v4.3-v4.5 in Firefox when the site defines a <base href>. Many SPAs require a base href to be defined. This issue was not seen in other browsers nor in pdfjs v.4.2. It was seen in Firefox regardless of OS.

Additional note: Relatively certain this issue is SVG IRI references in display_utils.js.

Hypercubed commented 2 weeks ago

Possible related issue: https://github.com/mozilla/pdf.js/issues/16287

Hypercubed commented 2 weeks ago

Not confident this is a good fix but adding the current url to the filters seams to work in my env:

+  #location;
+
   #id = 0;

   constructor({ docId, ownerDocument = globalThis.document } = {}) {
     super();
     this.#docId = docId;
     this.#document = ownerDocument;
+    this.#location = this.#document.location;
   }

   get #cache() {
@@ -146,7 +149,7 @@ class DOMFilterFactory extends BaseFilterFactory {
     //  https://www.w3.org/TR/SVG11/filters.html#feComponentTransferElement

     const id = `g_${this.#docId}_transfer_map_${this.#id++}`;
-    const url = `url(#${id})`;
+    const url = `url(${this.#location}#${id})`;
     this.#cache.set(maps, url);
     this.#cache.set(key, url);
Snuffleupagus commented 1 week ago

We tested accross multiple browsers, OSs, sites, and PDFs. The issue was eventually narrowed down to pdfjs v4.3-v4.5 in Firefox when the site defines a <base href>.

Would this perhaps suggest that it's a general browser bug in Firefox, which should be reported (in Bugzilla) and fixed there instead of us having to work-around it in the PDF.js project?

Not confident this is a good fix but adding the current url to the filters seams to work in my env:

Unfortunately that diff has a couple of issues:

stephanrauh commented 1 week ago

As far as I understand, that's not a browser bug, but a peculiarity of SVG filters. @Hypercubed can tell you more about it.

Thanks for detecting the issues you've mentioned!

Snuffleupagus commented 1 week ago

As far as I understand, that's not a browser bug, but a peculiarity of SVG filters.

In that case all browsers ought to be similarly affected, which according https://github.com/mozilla/pdf.js/issues/18406#issue-2394259395 doesn't appear to be the case!?

calixteman commented 1 week ago

FYI, in Firefox we removed the base element: https://github.com/mozilla/pdf.js/pull/16153

Hypercubed commented 1 week ago

@Snuffleupagus I can't really speak to why this impacts Firefox only. Seams like all browsers should be impacted the same; from what I understand of the SVG it sounds like Firefox is "doing the right thing". There are several google hits that specifically mention different behavior in Firefox... FWIW.

As for the patch... yes that was just the first few lines. I tested replacing all url(#${id}) with url(${this.#location}#${id}) and it seamed to work fine. Not extensively tested. The reason I decleared #location in the constructor is I figured there should be a better way to find the base href. I could do a PR but wanted to discuss this more.

Another idea I had was if the SVG rendering could be done in another document (using something like createHTMLDocument)... but I don't really have a grasp of how that works.

Snuffleupagus commented 1 week ago

The smallest patch I could imagine, assuming that we actually want to attempt to fix this, would be something along these lines (with the functionality limited to GENERIC builds): https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:pdf.js:DOMFilterFactory-baseUrl

Another idea I had was if the SVG rendering could be done in another document (using something like createHTMLDocument)... but I don't really have a grasp of how that works.

That sounds essentially like the following getDocument parameter: https://github.com/mozilla/pdf.js/blob/1bdd6920ff0b443ab7893ee103d8185bcf2bb7f2/src/display/api.js#L194-L196

Hypercubed commented 1 week ago

What if we look for the presense of the base href directly; and add that to the urls?

this.#baseHref = this.#document.querySelector('head base')?.attributes.href.value || '';
Hypercubed commented 1 week ago

@Snuffleupagus I just saw your patch (missed it the first tiome reading your comment).

What about docBaseUrl defined in api.js (see below). Can we pass that into the DOMFilterFactory constuctor? Is src.docBaseUrl the same as what you've defined in the createUrl method?

  const docBaseUrl =
    typeof src.docBaseUrl === "string" && !isDataScheme(src.docBaseUrl)
      ? src.docBaseUrl
      : null;
Snuffleupagus commented 1 week ago

What about docBaseUrl defined in api.js (see below).

Please see: https://github.com/mozilla/pdf.js/blob/1bdd6920ff0b443ab7893ee103d8185bcf2bb7f2/src/display/api.js#L139-L141

Can we pass that into the DOMFilterFactory constuctor?

No, that's not appropriate since (as documented) it has a completely different purpose.

Is src.docBaseUrl the same as what you've defined in the createUrl method?

No, that parameter is user defined and can be any valid URL.

Hypercubed commented 1 week ago

@Snuffleupagus I'm curious about the typeof PDFJSDev !== "undefined" && PDFJSDev.test("GENERIC && !TESTING") is clause. When is PDFJSDev defined? I thought it was only defined when testing. If so this fix would never be applied in when used.

Hypercubed commented 1 week ago

@Snuffleupagus FYI.. I tested your patch locally... works as expected once I remove the conditional mentioned in my last comment.

Snuffleupagus commented 1 week ago

(with the functionality limited to GENERIC builds):

Given the above quote it's not expected to run anywhere else, so only npx gulp generic (or npx gulp generic-legacy) would include the code.


The initial instance of SVG filter use came in PR #16062, which is well over a year ago, and this is the first time that we've seen the filter URLs cause issues in the general PDF.js library. Hence using a <base>-element like this cannot be a particularly common occurrence, which means that it's not immediately clear (to me) what solution we want here:

/cc @calixteman, @timvandermeij Any thoughts/opinions on how to proceed here?

Hypercubed commented 1 week ago

@Snuffleupagus

Option 3:

Hypercubed commented 1 week ago

Hence using a <base>-element like this cannot be a particularly common occurrence

It showed up in stephanrauh/ngx-extended-pdf-viewer only after upgrading pdfjs past v4.2 (and only Firefox). It's a question of how many instances of PDFjs >4.2 are running in SPAs. Maybe not many so far...

calixteman commented 1 week ago

@Snuffleupagus

Option 3:

* create absolute filter URLs in GENERIC builds, only if <base>-element is used.

Could it be detected in using document.baseURI !== document.URL ?

Snuffleupagus commented 1 week ago

Could it be detected in using document.baseURI !== document.URL ?

Yes, that should work as far as I understand.

Hypercubed commented 1 week ago

@calixteman I was thinking !!this.#document.querySelector('head base')?.attributes.href.value or something like that.