mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.15k stars 9.95k forks source link

Embed a .wasm file for the jpeg2000 decoder in order to reduce the bundle size #17950

Open calixteman opened 5 months ago

calixteman commented 5 months ago

For reference: https://github.com/mozilla/pdf.js/pull/17946#discussion_r1566004430

Snuffleupagus commented 5 months ago

I might be misunderstanding the intent here, however having a separate file seems quite unfortunate given all the ways that the general library can be used:

Also, requiring third-party users to bundle an additional file will likely lead to a support "challenge"...


I'm assuming that the current file-size issue is related to the fact that Base64 is a very inefficient data format? Hence the following, possibly stupid, idea that unfortunately depends on a new web-platform feature becoming available in Firefox (see this and that):

calixteman commented 5 months ago

The current openjpeg.js contains wasmBinaryFile;wasmBinaryFile="data:application/octet-stream;base64,AGFzbQEAAAAB1wEbYAN/f38Bf2AEf3... If we generate two files: js + wasm, then the wasm file is just a binary file but the js one contains wasmBinaryFile=new URL("openjpeg.wasm",import.meta.url).href and then when building for m-c, I get this code in the worker file:

/******/    __webpack_require__.b = document.baseURI || self.location.href;
...

and then Firefox is unhappy with document. If I manually remove document.baseURI || then it's ok.

@Snuffleupagus, maybe I wrongly understood what you proposed but I've the impress that it's almost what it's currently done (I mean embedding the wasm array as a base64 string in openjpeg.js).

At least in the Firefox case it'd help to win 70Kb. That said, I can update the script to build openjpeg in the two ways: a single file and two files, then use the single version for non-Firefox versions and the two files one for Firefox (but in finding a way to remove the document.baseUri).

Snuffleupagus commented 5 months ago

maybe I wrongly understood what you proposed but I've the impress that it's almost what it's currently done (I mean embedding the wasm array as a base64 string in openjpeg.js).

My idea would add a Uint8Array in the file, which is more efficient, and during runtime when calling OpenJPEG() convert that into the necessary Base64-string.

calixteman commented 5 months ago

But having a string Uint8Array([1,2,3, ...]) should take more space than a base64 one, no ?

Snuffleupagus commented 5 months ago

But having a string Uint8Array([1,2,3, ...]) should take more space than a base64 one, no ?

Quite possibly yes, I've not really thought a lot about this. Hence me saying that it probably was a stupid idea :-)


That said, I can update the script to build openjpeg in the two ways: a single file and two files, then use the single version for non-Firefox versions and the two files one for Firefox (but in finding a way to remove the document.baseUri).

Yes, if we use this multi-file approach that "must" be limited to the Firefox PDF Viewer to avoid a barrage of issues elsewhere.