webrecorder / wombat

Wombat.js client-side rewriting library
GNU Affero General Public License v3.0
81 stars 31 forks source link

Wombat does not support Javascript importmap #141

Closed benoit74 closed 2 months ago

benoit74 commented 2 months ago

Javascript importmap (see e.g. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules#importing_modules_using_import_maps) are not properly supported by wombat.

Looks like URLs are processed while they should be considered as simple identifier.

Sample website using an importmap : https://website.test.openzim.org/javascript.html (see test cases 8 and 9)

Sample WARC : https://tmp.kiwix.org/ci/test-warc/javascripttest.warc

Few test cases of the sample website are working only due to "luck".

ikreymer commented 2 months ago

Sample website using an importmap : https://website.test.openzim.org/javascript.html (see test cases 8 and 9)

That is a neat test site for replay - thanks for making that, just tried it with wabac.js and pywb as well - we should make sure that all of these pass eventually.

Sample WARC : https://tmp.kiwix.org/ci/test-warc/javascripttest.warc

Few test cases of the sample website are working only due to "luck".

The import map are part of the page, so actually wouldn't be handled wombat for this example, but with server-side rewriting. This should not be hard to do, but you probably want to do it in the warc2zim2 rewriter.

It would be need to be handled wombat if the script tag was dynamically created during runtime, eg. with something like:

const elem = document.createElement("script")
elem.type = "importmap"
elem.innerText = "..."

However, that won't help the specific examples, and i'm not sure how common this is? We'll implement this as well, will leave this issue here for now to track.

benoit74 commented 2 months ago

That is a neat test site for replay - thanks for making that, just tried it with wabac.js and pywb as well - we should make sure that all of these pass eventually.

Thank you. It will continue to get enriched with more test cases for warc2zim, so feel free to benefit from it as well, you deserve it ^^

The import map are part of the page, so actually wouldn't be handled wombat for this example, but with server-side rewriting. This should not be hard to do, but you probably want to do it in the warc2zim2 rewriter.

AFAIK it is more complex than that. It is indeed part of the HTML code. But what is says is that "whenever you encounter an import of resources, javascript/resources, ../javascript/resources, https://example.com/resources, https://website.test.openzim.org/resources, https://xn--wbsite-bva.test.openzim.org/resources, https://wébsite.test.openzim.org/resources in a Javascript module, please import instead the resource at ./javascript/resources.js. And this instruction is for the Javascript engine.

This means that:

What I've seen is that when running inside the ZIM, the importmap is not applied at all. I suspected (since it does not work either in wabac, and the URL does not need to be rewritten in fact) that there is a problem with wombat interfering with the importmap feature. But I'm clearly not sure of that.

However, that won't help the specific examples, and i'm not sure how common this is?

I never saw it "in the wild" but it is a rather new feature, so I'm quite sure it will come on day or another. This won't be tackled as an urgent topic on our side, we will just document this as "not supported for now" and wait to see when we encounter it in the wild.

ikreymer commented 2 months ago

This means that:

  • we must rewrite "server-side" (wabac / warc2zim2) the ./javascript/resources.js URL (no rewrite needed here indeed)
  • wombat must not interfere with the importmap resolution

Yes, that is correct. This is something that will be implemented in wabac.js and warc2zim2 must do that as well on its own.

As a client side library, wombat does not interfere with preexisting scripts that are already on the page.

To test this, I have made a branch that adds importmap rewriting to wabac.js: https://github.com/webrecorder/wabac.js/tree/importmap-rw

With this branch, I can report that the tests on https://website.test.openzim.org/javascript.html all pass!

You can test this out by checking out that branch, then running: yarn start-live-proxy and loading http://localhost:10001/#https://website.test.openzim.org/javascript.html

The issue you have is likely in warc2zim2. I don't have time to help with warc2zim2, but can say that rewriting is fairly straightforward, just rewriting the keys in the importamp. We have a special esm_ modifier that is used to indicate esm module scripts, and rewrite the keys with that modifier. Also, added a change to ensure esm_ modifiers are always treated as UTF-8 even if there is no content-type. (https://github.com/denoland/deno_core/issues/128 makes the argument that is the correct behavior per the standard and that seems to be what the browser does as well)

benoit74 commented 2 months ago

I still don't get why we have to rewrite the keys of the importmap. These are simply keys in a dictionary, so I don't get why you had to rewrite them. These keys are even placed on domains which do not exists, so we shouldn't really care about their values, they must just be aligned between the importmap and the JS modules AFAIK.

But I probably didn't dig enough yet.

I agree that your code works, but to me for now it looks like a patch on a bug, i.e. you fix in wabac an unsolicited behavior induced by something. But again, I didn't dig enough in the topic.

As I said, it is not an urgent topic for now, so we will look into it a later date.

Thank you anyway for your help, much appreciated.

ikreymer commented 2 months ago

I still don't get why we have to rewrite the keys of the importmap. These are simply keys in a dictionary, so I don't get why you had to rewrite them. These keys are even placed on domains which do not exists, so we shouldn't really care about their values, they must just be aligned between the importmap and the JS modules AFAIK.

They need to be rewritten because they are URLs, and we don't know if they are keys or actual URLs that will be loaded, especially if they look like URLs:

Given:

import X from "https://fake-url.example.com/path/script.js"

We need to rewrite this to <prefix>/https://fake-url.example.com/path/script.js because when processing this script, we must assume the browser would load this URL directly!

Now, if the importmap overrides this default behavior from a particular HTML page (the point of importmaps) and so we must match the URL that would otherwise be loaded, so we must also rewrite the URL in the importmap to override the loading behavior:

{
  imports: {
    "<prefix>/https://fake-url.example.com/path/": "./some/other/path/"
  }
}

For keys that are not absolute URL, they are not rewritten, eg: resources here: https://github.com/webrecorder/wabac.js/blob/importmap-rw/test/rewriteHTML.js#L447

Hope this helps clear it up! (Closing as there isn't anything to do in wombat for now).