Using npm modules in web workers in our code - a retrospective

TylerLeonhardt commented 3 years ago

This is kind of a meta issue around my experience adding a web worker to our code for automatic language detection. If you don't know what this feature is all about here's a gif of it in action: thing It uses a ML model to guess what language your code is and sets the mode accordingly automatically. Because running an ML model can be intensive, we decided to have it run in a web worker. It's the perfect place for a feature like this.

The model is brought in using a npm module found here (and published on npm): https://github.com/Microsoft/vscode-languagedetection and before moving to a worker, we were able to simply use this in the service's code and everything worked as expected:

await import('@vscode/vscode-languagedetection');

code pointer in history

However, moving this code over to the web worker did not have the same success... with a very simple error of:

TypeError: Failed to fetch

The network tab gives us a hint: and if I hover over the file, the path is:

vscode-file://vscode-app/Users/tyleonha/Code/Microsoft/vscode/out/@vscode/vscode-languagedetection.js

This path is actually wrong. It should have come from a node_modules folder... but it does not. What's also interesting is that this issue only shows up in Electron... in the web it works just fine:

Now let's talk about the workaround... in order to get this working I used the same tactic that I use for loading the non-JS files of the model (look at the additional context below). By using the FileAccess API and a relative path, you can always get the Uri to any file in our package, and that's what I do to get the JS file that we import:

Do this on the UI side:

const moduleLocation = '../../../../../../node_modules/@vscode/vscode-languagedetection';
const moduleLocationAsar = '../../../../../../node_modules.asar/@vscode/vscode-languagedetection';
this._environmentService.isBuilt && !isWeb
    ? FileAccess.asBrowserUri(`${moduleLocationAsar}/dist/lib/index.js`, require).toString(true)
    : FileAccess.asBrowserUri(`${moduleLocation}/dist/lib/index.js`, require).toString(true),

And then use a fmr (basically a request to the UI side) on the worker side to get the Uri and import it:

const uri: string = await this._host.fhr('getIndexJsUri', []);
const { ModelOperations } = await import(uri) as typeof import('@vscode/vscode-languagedetection');

This........ works...... but it's really quite ugly and feel like there's gotta be a better way to do this. I think the folks that can help the most is @alexdima and @deepak1556 ... but maybe @bpasero and @rebornix are also interested in this knowledge.

The main PR that moved the language detection to a worker is found here (there are few additional misc commits that have gone in since then): #130006

Additional context

The logic around firing the model is also a bit odd...

the code can be found here: https://github.com/Microsoft/vscode-languagedetection this repo actually contains the model
Since the model is not purely written in JS (it consists of 2 files: a JSON file & a binary file) it's very hard to bundle these files properly such that it works in Node.js, Electron, & web. Because of this, we have to load them using VS Code's FileAccess API (Similar tactics are used to load in vscode-oniguruma's WASM file)

deepak1556 commented 3 years ago

The issue is not impacted by the protocol used for the resource its just a limitation from today's loader, vscode-loader supports different loaders depending on the environment in which a script resource is included, the current WorkerScriptLoader does not respect node modules as it cannot use the NodeScriptLoader inside a web worker, so it ends up resolving similar to other workbench resources and passes it down to the fetch api.

One alternative workaround for your scenario that will work today is to specify a baseUrl inside languageDetectionSimpleWorker.ts that will be used by the loader in resolving the resource.

// On the UI side
const moduleUri = fileUriFromPath(environmentService.appRoot, { scheme: 'vscode-file', fallbackAuthority: 'vscode-app' })/node_modules;

// On the worker
const uri: string = await this._host.fhr('getModuleUri', []);
let loaderConfiguration: any = (<any>self).require.getConfig();
loaderConfiguration.baseUrl = uri;
(<any>self).require.config(loaderConfiguration);
const { ModelOperations } = await import('@vscode/vscode-languagedetection/dist/lib/index');

I am not sure what is the path forward for resources loaded from node_modules.asar in the workbench as we move close to the browser model in which case a solution would be needed there as well. @alexdima @bpasero thoughts ?

alexdima commented 3 years ago

I agree this is ugly, but it stems from the way AMD works. In AMD, when someone wants to load a module (e.g. require(['vs/base/browser/dom'], ...)), the AMD loader will first need to convert this module identifier into a path.

The transformation of a module identifier to a path can be tweaked via loader configurations. The loader uses two main configurations, baseUrl and paths for this purpose, but there are others. Given the loader config:

{ 
  baseUrl: 'vscode-file://vscode-app/Users/tyleonha/Code/Microsoft/vscode/out/', 
  paths: {} 
}

The loader will transform the module id vs/base/browser/dom to:

vscode-file://vscode-app/Users/tyleonha/Code/Microsoft/vscode/out/vs/base/browser/dom.js

But you can see how this falls apart for node modules. As you have documented above, the module id @vscode/vscode-languagedetection will similarly be transformed to the path:

vscode-file://vscode-app/Users/tyleonha/Code/Microsoft/vscode/out/@vscode/vscode-languagedetection.js

which does not exist.

So the error you are seeing makes perfect sense in a strict AMD compliant loader, which is how vscode-loader operates inside a web worker.

There are two distinct reasons why require(['@vscode/vscode-languagedetection']) works on the desktop in the renderer process and works on the web in the main page, but not on the worker.

On the desktop, in the renderer process, vscode-loader is not working as an AMD spec-compliant loader. When running with nodejs integration (which is how we normally run it), the loader is configured with the setting amdModulesPattern: /^vs\//. This setting tells vscode-loader that only module ids starting with vs/ are AMD modules and the rest of the module ids refer to node modules. So in the desktop, in the non-sandboxed case, the loader will delegate all other require calls to the underlying nodejs commonjs based require.

You can see this configuration here. Since maybe you are there, you can see how things work when there is no nodejs integration (e.g. for the sandbox case or for the web case).

The web is configured here. The global self.webPackagePaths comes from out/vs/webPackagePaths.js, which gets generated by running yarn watch. Here is how it looks like:

self.webPackagePaths = {
  "@microsoft/applicationinsights-web": "dist/applicationinsights-web.js",
  "@vscode/vscode-languagedetection": "dist/lib/index.js",
  "iconv-lite-umd": "lib/iconv-lite-umd.js",
  "jschardet": "dist/jschardet.min.js",
  "tas-client-umd": "lib/tas-client-umd.js",
  "vscode-oniguruma": "release/main.js",
  "vscode-textmate": "release/main.js",
  "xterm": "lib/xterm.js",
  "xterm-addon-search": "lib/xterm-addon-search.js",
  "xterm-addon-unicode11": "lib/xterm-addon-unicode11.js",
  "xterm-addon-webgl": "lib/xterm-addon-webgl.js"
};

Basically, both on the desktop with sandbox case and in the web case, the loader, since it doesn't have access to nodejs is configured using paths to how to transform a module id to a path.

Note: now that I see the sandbox config paths, it looks like they are out of sync. cc @lramos15 we could also investigate using the generated webPackagePaths in bootstrap-window for the sandboxed case.

So long story short, if you want to load a node module in a web worker (you're the first one that wants to do that, sorry!), you need to configure the loader and teach it what is the "main" file of that node module. Perhaps something like:

require.config({
  paths: {
    '@vscode/vscode-languagedetection': '../node_modules/@vscode/vscode-languagedetection/dist/lib/index.js'
  }
});

lramos15 commented 3 years ago

@alexdima Does bootstrap-window use the same packages as web?

alexdima commented 3 years ago

Does bootstrap-window use the same packages as web?

Short answer is I don't know.

In a sandboxed renderer, we load the services implemented in /electron-sandbox/ or in /browser/, so there might be differences in what node modules those different implementations end up loading. I think there is a way to launch (maybe --sanbox if I'm not mistaken) to test this. @deepak1556 @bpasero what is the flag to launch with the sandbox enabled?

bpasero commented 3 years ago

Yes, we load the same node modules in web as we do in desktop, there is no difference anymore. Especially since we moved to vscode-file there is also no node.js involved anymore in loading modules, just script tags.

bpasero commented 3 years ago

Hm, that said I wonder if this code can actually go away given we use script tags now:

https://github.com/microsoft/vscode/blob/b6ca3e52ae62b64f7e30009bed4402a0235a4e92/src/bootstrap-window.js#L132-L141

alexdima commented 3 years ago

Hm, that said I wonder if this code can actually go away given we use script tags now:

@bpasero I think the short answer is no. But maybe it can be replaced by using the generated out/vs/webPackagePaths.js, but that might be difficult to bundle. If a loader is not configured with paths for node modules, it will go looking for them using the AMD lookup rules. The long answer is here.

bpasero commented 3 years ago

@alexdima oh wow, so this is a bit of a missing adoption then. We indeed load all node modules in the renderer still via node.js require and thus we do not benefit from the V8 caching support here that we have for the other code.

I just gave it a quick test and changed the code to:

loaderConfig.paths = {
    'vscode-textmate': `../node_modules/vscode-textmate/release/main`,
    'vscode-oniguruma': `../node_modules/vscode-oniguruma/release/main`,
    'xterm': `../node_modules/xterm/lib/xterm.js`,
    'xterm-addon-search': `../node_modules/xterm-addon-search/lib/xterm-addon-search.js`,
    'xterm-addon-unicode11': `../node_modules/xterm-addon-unicode11/lib/xterm-addon-unicode11.js`,
    'xterm-addon-webgl': `../node_modules/xterm-addon-webgl/lib/xterm-addon-webgl.js`,
    'iconv-lite-umd': `../node_modules/iconv-lite-umd/lib/iconv-lite-umd.js`,
    'jschardet': `../node_modules/jschardet/dist/jschardet.min.js`,
};

if (!safeProcess.sandboxed) {
    loaderConfig.amdModulesPattern = /^vs\//;
}

But that would still result in node modules getting loaded via require, probably because the amdModulesPattern rule is winning over the configured paths. I cannot remove the amdModulesPattern rule though because we still have a few places that require node.js standard modules (like fs).

Is it worth doing something now or do this after we have gotten rid of the few node.js modules we still load (mainly file service which I wanted to look at this iteration).

alexdima commented 3 years ago

:+1: If we want that node modules get loaded as AMD, then we need to either remove amdModulesPattern or adjust it to a regex that contains all AMD node modules. If we remove amdModulesPattern, there will be a slight performance loss or a 404, as requiring fs would first look for it in out/fs.js. We can remove amdModulesPattern and use a config called nodeModules to list one by one the modules we want to be delegated to nodejs. Complete example:

loaderConfig.paths = {
    'vscode-textmate': `../node_modules/vscode-textmate/release/main`,
    'vscode-oniguruma': `../node_modules/vscode-oniguruma/release/main`,
    'xterm': `../node_modules/xterm/lib/xterm.js`,
    'xterm-addon-search': `../node_modules/xterm-addon-search/lib/xterm-addon-search.js`,
    'xterm-addon-unicode11': `../node_modules/xterm-addon-unicode11/lib/xterm-addon-unicode11.js`,
    'xterm-addon-webgl': `../node_modules/xterm-addon-webgl/lib/xterm-addon-webgl.js`,
    'iconv-lite-umd': `../node_modules/iconv-lite-umd/lib/iconv-lite-umd.js`,
    'jschardet': `../node_modules/jschardet/dist/jschardet.min.js`,
};

loaderConfig.nodeModule = ['fs', 'os', 'child_process', 'net', ...];

or

loaderConfig.paths = {
    'vscode-textmate': `../node_modules/vscode-textmate/release/main`,
    'vscode-oniguruma': `../node_modules/vscode-oniguruma/release/main`,
    'xterm': `../node_modules/xterm/lib/xterm.js`,
    'xterm-addon-search': `../node_modules/xterm-addon-search/lib/xterm-addon-search.js`,
    'xterm-addon-unicode11': `../node_modules/xterm-addon-unicode11/lib/xterm-addon-unicode11.js`,
    'xterm-addon-webgl': `../node_modules/xterm-addon-webgl/lib/xterm-addon-webgl.js`,
    'iconv-lite-umd': `../node_modules/iconv-lite-umd/lib/iconv-lite-umd.js`,
    'jschardet': `../node_modules/jschardet/dist/jschardet.min.js`,
};

loaderConfig.amdModulesPattern = /(^vs\/)|(^vscode-textmate$)|(^vscode-oniguruma$)..../;

bpasero commented 3 years ago

@alexdima thanks, I went with the second approach:

https://github.com/microsoft/vscode/blob/0b84f39fb0b2052338ac185f24ee9251c96e654d/src/bootstrap-window.js#L127-L147

TylerLeonhardt commented 3 years ago

@bpasero with that change should I switch over to requiring vscode-languagedetection using its name now?

bpasero commented 3 years ago

@TylerLeonhardt not sure I can follow, I tried to copy exact the name of our import: @vscode/vscode-languagedetection, e.g.

https://github.com/microsoft/vscode/blob/430e98bf678e9b0e8f69365c87688dfd3ceb9778/src/vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker.ts#L6-L6

or

https://github.com/microsoft/vscode/blob/430e98bf678e9b0e8f69365c87688dfd3ceb9778/src/vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker.ts#L52-L52

TylerLeonhardt commented 3 years ago

The second one is the workaround for this issue. The first is just an import type so that the types in the file aren't a mess.

The second one I'd replace uri with @vscode/vscode-languagedetection

bpasero commented 3 years ago

Ok I am not sure I follow, but if you want to change the name of your module, wouldn't it be better to rename it in NPM?

TylerLeonhardt commented 3 years ago

In the original description of the issue, I talk about how I can't simply await import('@vscode/vscode-languagedetection') because the webworker thinks the location of that package is somewhere else.

So the workaround was to pass the uri to the index.js to the webworker so the webworker could load the module correctly.

I want to get rid of that hacky "passing the path" logic and replace it with:

await import('@vscode/vscode-languagedetection')

And was wondering if your commit fixed that

bpasero commented 3 years ago

Short: Oh I see, sorry my change is totally unrelated. I just hijacked this issue to clarify something I had forgotten to follow up with.

Long: we are in the process of building a node.js free renderer and part of it is to load all code via script tags and not node.js require calls. Because I had forgotten to tell the loader where the node modules are on disk, the loader was falling back to using require from node.js to load them. With my change we should now see script tags for node modules as well and the loader will no longer use node.js to load these modules.

alexdima commented 3 years ago

I want to get rid of that hacky "passing the path" logic and replace it with:

@TylerLeonhardt The loader configuration is forwarded from the renderer process to the web worker.

So if @bpasero pushes some code which changes the loader config to contain a path configuration for @vscode/vscode-languagedetection, then that configuration will be transported to the web worker.

From https://github.com/microsoft/vscode/issues/130302#issuecomment-910383180:

require.config({
  paths: {
    '@vscode/vscode-languagedetection': '../node_modules/@vscode/vscode-languagedetection/dist/lib/index.js'
  }
});

bpasero commented 3 years ago

@alexdima in today's insider I notice a regression from my change (even though things still work so we probably have a safety net in the loader to fallback to require?):

All modules fail to fetch, e.g. in more detail:

I wonder if this is a ASAR problem or a general problem with using the script tags for anything that is inside asar?

//cc @deepak1556 on ASAR support for custom protocols

deepak1556 commented 3 years ago

Files from asar will be respected by the browser network code path, the issue above is the path being incorrect. It should be node_modules.asar instead of node_modules, then only the network code will use the asar resource loader in the runtime.

alexdima commented 3 years ago

oh no :).

Yes, the loader running in a nodejs enabled Electron renderer will first try to load a module using the AMD rules (in this case respecting the paths setting you have defined), and if that fails, then it also tries loading the module using nodejs. So we are super lucky.

I also assume the problem is that the files are in an ASAR.

deepak1556 commented 3 years ago

For ex:

Should be

vscode-file://vscode-app/Applications/Visual%20Studio%20Code%20-%20Insiders.app/Contents/Resources/app/node_modules.asar/tas-client-umd/lib/tas-client-umd.js

instead of

vscode-file://vscode-app/Applications/Visual%20Studio%20Code%20-%20Insiders.app/Contents/Resources/app/node_modules/tas-client-umd/lib/tas-client-umd.js

alexdima commented 3 years ago

To further explain, the nodejs require would also not be able to find them and would throw an error. That is why we patch Module._resolveLookupPaths here. We actually inject node_modules.asar as a lookup path.

So to get script loading working, we would ned to massage the configured paths to be node_modules.asar, as @deepak1556 mentions.

bpasero commented 3 years ago

Thanks! I changed it to this now:

https://github.com/microsoft/vscode/blob/f8eb737b8286b8062036606ab760f4a59e58b333/src/bootstrap-window.js#L130-L141

lramos15 commented 3 years ago

@bpasero Would be interesting if you could use this function from the build script

https://github.com/microsoft/vscode/blob/d2a8f91fb5107d4bf677083a84546ffed1957f4d/build/lib/util.ts#L344-L367 since that's what we use to populate all the workbench.html's and the loader. You would then just be doing something like

        Object.keys(webPackagePaths).map(function (key, index) {
            webPackagePaths[key] = `../node_modules.asar/${key}/${webPackagePaths[key]}`;
        });

and benefit from the auto detection of new modules

bpasero commented 3 years ago

@lramos15 thats cool but we cannot use something like fs.readFileSync in the node.js free renderer going forward.

lramos15 commented 3 years ago

@lramos15 thats cool but we cannot use something like fs.readFileSync in the node.js free renderer going forward.

It is built and bundled with vscode as out/vs/webPackagePaths.js, but I'm not sure the process needed to access those file contents without being dependent on node.

TylerLeonhardt commented 1 year ago

It's been a while since I looked at this, and there's a lot going on here so I'm just going to close it. Something that I think might be useful to understand in the future is how could we share node modules tensorflow.js being one of them. Right now @vscode/vscode-languagedetection bundles it in but it's big... and maybe some other ML experience that also uses tensorflow will get added to VS Code and ideally we'd want to share to make the bundle smaller.

microsoft / vscode

Using npm modules in web workers in our code - a retrospective #130302

Additional context