webrecorder / wombat

Wombat.js client-side rewriting library
GNU Affero General Public License v3.0
81 stars 31 forks source link

Wombat breaks native JavaScript modules #116

Closed Rich-Harris closed 1 year ago

Rich-Harris commented 1 year ago

I was looking at archived copies of https://kit.svelte.dev on the Wayback Machine (example), and discovered that the JavaScript errors when it loads:

Uncaught (in promise) SyntaxError: import declarations may only appear at top level of a module

This is happening because the file in question is a JavaScript module (i.e. it uses import/export syntax):

var _____WB$wombat$assign$function_____ = function(name) {return (self._wb_wombat && self._wb_wombat.local_init && self._wb_wombat.local_init(name)) || self[name]; };
if (!self.__WB_pmw) { self.__WB_pmw = function(obj) { this.__WB_source = obj; return this; } }
{
  let window = _____WB$wombat$assign$function_____("window");
  let self = _____WB$wombat$assign$function_____("self");
  let document = _____WB$wombat$assign$function_____("document");
  let location = _____WB$wombat$assign$function_____("location");
  let top = _____WB$wombat$assign$function_____("top");
  let parent = _____WB$wombat$assign$function_____("parent");
  let frames = _____WB$wombat$assign$function_____("frames");
  let opener = _____WB$wombat$assign$function_____("opener");

import{o as Ce,t as se}from"../chunks/index.1cc2e0ff.js"; // rest of the code omitted

}
/*
     FILE ARCHIVED ON 16:57:32 Mar 27, 2023 AND RETRIEVED FROM THE
     INTERNET ARCHIVE ON 00:11:11 Mar 28, 2023.
     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.

     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
     SECTION 108(a)(3)).
*/
/*
playback timings (ms):
  captures_list: 89.103
  exclusion.robots: 0.064
  exclusion.robots.policy: 0.055
  RedisCDXSource: 5.547
  esindex: 0.007
  LoadShardBlock: 67.337 (3)
  PetaboxLoader3.datanode: 85.078 (4)
  load_resource: 84.5
  PetaboxLoader3.resolve: 47.163
*/

As the error message says, import and export can only appear at the top level of a module, not inside a function body. Unfortunately this means that Wombat's transformation breaks every site that uses native modules. Today, these sites are in the minority, but since modules are supported in all modern browsers and many tools emit them automatically, the number is growing rapidly.

Afraid I don't have a great suggestion for a solution handy! Perhaps something like this, if import/export is detected in the code...

import { window, self, etc... } from 'data:application/javascript;base64,xyz123';

...where xyz123 is a base64-encoded module containing the relevant overrides?

ikreymer commented 1 year ago

@Rich-Harris Thanks for making a note of this. We (Webrecorder project) have actually implemented module detection + different rewriting path in some of our tools (see: webrecorder/wabac.js#101, webrecorder/pywb#810) which does something similar to what you suggest.

For example, oldweb.today use our system for the full stack https://oldweb.today/?browser=ruffle#20230327165731/https://kit.svelte.dev/ and https://express.archiveweb.page/#https://kit.svelte.dev/ would attempt to determine if the JS in a module and apply the correct rewriting. In testing these, I did notice another error (from incorrect assumption that dynamic import implies the code is in a module, marking the inline script as a module), but that's something that should be fixable.

The IA wayback machine only uses wombat.js from Webrecorder, the rest of their stack is their own proprietary implementation, so they would need to implement the module detection on their end, but it is possible.

As long as Svelte doesn't use implicit cross-file globals (see #82), things should generally work. In general, don't think we've seen any replaying Svelte based sites :)

Rich-Harris commented 1 year ago

Good to know, thank you! I contacted IA to let them know about this issue, hopefully they'll be able to implement the module detection.

ikreymer commented 1 year ago

In testing these, I did notice another error (from incorrect assumption that dynamic import implies the code is in a module, marking the inline script as a module), but that's something that should be fixable.

This is now fixed on our end, see this archive and oldweb.today live version. The video can replay if it is archived and the module rewriting is applied