wooorm / parse-entities

Parse HTML character references
MIT License
49 stars 12 forks source link

Using document to parse entities makes the lib (and deps) unusable in web workers #19

Closed yohcop closed 2 years ago

yohcop commented 2 years ago

The top-level document object is unefined in a webworker context.

diff

I am using remark and others from within a webworker, and upgrading dependencies upgraed parse-entities to 3.1.0, breaking things.

No huge deal for now, I'll upgrade later, but just letting you know.

wooorm commented 2 years ago

Hey! Please see the code for when that is used:

https://github.com/wooorm/parse-entities/blob/6d7cef95baa04c3430f929cb017b541f54b9e863/package.json#L28-L33

Your bundler explicitly requests browser fields. Web Workers are not browsers. Configure your bundler if you’re bundling or web workers.

Duplicate of #16

yohcop commented 2 years ago

Thanks for the blazing fast reply :)

That's interesting. If I change the bunler behavior to not pick browser files, it ends up trying to bundle node libraries (for my example, the first issue is with ObjectHash, for which, I do want to use the browser version, as it otherwise tries to use node packages) :/

It seems to me that the default would be "node" for a package, so that makes sense. "browser" is closer to web workers than node, but agreed that web workers aren't browsers, and in fact the nodejs doc says "browser" is for code that may use things like window, which webworkers don't have either. But it also never mentions web workers.

I know this is now likely outside this pakage's responsibility, since it seems to me that you are doing it right, but just wondering if you know the right fix here? I'm using esbuild as bundler but that seems like a detail anyway.

Best,

wooorm commented 2 years ago

It sounds like there’s a lot of other code that deals with crypto and such that makes it all rather complex. 🤔. Perhaps your bundler can polyfill that or swap in browser-specific code? Or, you could create two bundles maybe?

wooorm commented 2 years ago

I don’t know of a way to set a webworker field or so that does point to the non-DOM version, so I don’t think I can solve your problem in this project?

yohcop commented 2 years ago

So I found a solution, I tell esbuild to build the webworker bundle with mainFields: ['react-native', 'browser', 'module','main'].

This way, it picks the react-native code from this lib over browser, and browser over node for object-hash (and possibly others). But, I feel like it's a time-bomb that at one point I'll use a lib with react-native specific code tha I don't want... :D But anyway, it works.

I'm pretty sure esbuild doesn't really know about 'react-native'. Therefore, that made me wonder.... Can that field ("react-native", "browser") be... anything? Esbuild still works if I add a random string to the mainFields array.... :thinking: But then, from a short search, I couldn't find evidence that "webworker" would be a common target in package.json for example.

However, I found this: https://webpack.js.org/configuration/resolve/#resolvemainfields Webpack prefers 'browser' for webworkers.

So yeah, no need to fix anything I suppose, but maybe taking a look at that webpack default, as it's a common bundler.

wooorm commented 2 years ago

Asked on twitter: https://twitter.com/wooorm/status/1454481651942232067

yohcop commented 2 years ago

Well, I must have been dreaming. I went for a run, and now the 'react-native' mainField doesn't seem to have any effect. Usually going running fixes problems, not create them, haha!

I temporarily forced this lib to 3.0.0 until I find a better solution.

Thanks so much for your help!

lacolaco commented 2 years ago

I'm also using remark in a webworker environment and had the same problem, and finally found here. Using v3.0.0 seems to be the only way to fix it, but I hope I can find a better way too.

wooorm commented 2 years ago

I don’t understand why v3 is different in this that v3.1? https://github.com/wooorm/parse-entities/compare/3.0.0...3.1.0

stevemk14ebr commented 2 years ago

Any solution? I am already using the recommended webpack worker-loader to load my web worker, which makes a seperate bundle as documented here:

https://www.npmjs.com/package/worker-loader

I do not know how to resolve this.

stevemk14ebr commented 2 years ago

Using yarn I can at least override all the transitive includes to pin this to the old version:

in package.json:

  "resolutions": {
    "parse-entities": "3.0.0"
  },
wooorm commented 2 years ago

I don’t know webpack. How to configure webpack for web workers is really a question for them.

I do know that I’m setting a browser field here (and in many different projects), for browser specific code. I don’t think it’s weird that browser specific code expects document to be there.

So rather that doing resolutions or so — which doesn’t fix the actual problem, it silences an immediate error, but now your code will crash when a user enters a character reference — I’d recommend going to webpack folks for how to configure webpack.

stevemk14ebr commented 2 years ago

The only alternative bundler is react-native which also does not seem correct in this environment, perhaps it is falling back to browser as react-native is also inappropriate.

wooorm commented 2 years ago

What’s incorrect about it? https://github.com/wooorm/parse-entities/blob/6d7cef95baa04c3430f929cb017b541f54b9e863/package.json#L32 This does the inverse of what browser does: namely use the default, non-DOM algorithm.

stevemk14ebr commented 2 years ago

I think it may be incomplete. I would never expect the bundler to choose react-native because I am in a web environment. A web worker yes, but a web environment still.

https://www.jonathancreamer.com/how-webpack-decides-what-entry-to-load-from-a-package-json/

Could you try adding a 'module' or 'main'? This is a guess of course, I am low on ideas

wooorm commented 2 years ago

What’s incomplete about it? There is a main in package.json. main is what every tool uses. module is a webpack/rollup specific feature to support faux-ESM. This package never published faux-ESM, so it shouldn’t use it. There is nothing wrong about react-native. Some bundlers pick browser by default in react-native (which is wrong), hence that the react-native field reverts that: https://github.com/wooorm/parse-entities/pull/15. And it seems highly unlikely that it has anything to do with what you‘re doing with your setup: web workers in webpack. If it is though, that’s a bug there? Your bundler is explicitly choosing browser code, not react-native code or normal code. Browser code can expect that there are browser APIs available. I’m fine adding a web-worker field but that isn‘t supported.

yohcop commented 2 years ago

I think I understand both sides here :)

Maybe the second point is the issue, and Webpack & al. should not select browser. However, it also seems to be a bit of a de-facto standard (although, that may be pushing the definition a bit). Maybe somebody should suggest adding a webworker field. But by the time that happens... I also don't have such powers ;)

And something is strange indeed. Like @wooorm mentioned in this comment: what's the difference with 3.0.0 ? That's actually puzzling to me. Because if I inspect the bundle, it seems that a bundle with 3.0.0 still uses document.createElement. However, it is not called in the docs I tried. Could it be that 3.1.0 calls the decode function more? I'm not really sure. I'll keep digging, but just suggesting ideas since you have more knowledge here :)

Finally, would bundlers optimize correctly for webworker, if the code had something like

if (document) {
  // use document.createElement, etc
} else {
  // use unoptimized, but OK for webworker code
}
wooorm commented 2 years ago

Yeah that’s a good description of what’s going on!

But: for the diff between 3.1.0 and 3.0.0, I believe it’s that document.createElement now always runs when the code is loaded: https://github.com/wooorm/parse-entities/blob/6d7cef95baa04c3430f929cb017b541f54b9e863/lib/decode-entity.browser.js#L5, and it used to only run when the function was called. I’m quite sure that in 3.0.0, you would still have errors when a character reference was there, and now it immediately errors.

Finally, would bundlers optimize correctly for webworker, if the code had something like

The problem is that the non-browser code requires a lot of data to do this. Hence the browser field to use browser APIs which requires much much less data.


I believe Rollup lets you specify fields to choose, such as browser, react-native, etc. If webpack also supports that, y’all can at least set it to web-worker and we can add a web-worker field here similar to react-native.

stevemk14ebr commented 2 years ago

I'm sorry I do not believe the suggestion of implementing a custom 'web-worker' field will work. As @yohcop points out webpack tries 'browser', then 'module', then 'main' when the 'target' is set to webworker. I suggest one of the following options:

  1. Add a fallback code path as @yohcop suggests
  2. Make this feature optional for other remark libraries, I personally have no use for html entities like this, but I must run remark within a web worker to solve input jank issues since the parser blocks the single JS thread in a browser. This library is a transitive dependency for me.
  3. Allow the 'implementation' to be injected as an argument, rather than automatically choosing based on the bundler config. Then users would be free to choose the implementation they need without relying on bundlers which don't seem to support this scenario well.

As an added consideration, code size is not a big deal to me as a user in this particular case. I am able to modifying my webpack configuration because I have ejected from the create-react-app I used to start my project but it should be noted that not all users are able to modify their webpack configs!

P.S could this be opened? It's closed atm and still very much an issue

ChristianMurphy commented 2 years ago

webpack tries 'browser', then 'module', then 'main' when the 'target' is set to webworker

And it is also noted that this is configurable with https://webpack.js.org/configuration/resolve/#resolvemainfields

if (document) {
  // use document.createElement, etc
} else {
  // use unoptimized, but OK for webworker code
}

This would mean that browsers would still need to load the full unoptimized code path, right? This seems like a poor trade off for something which is fairly easily configurable in a bundler.

Allow the 'implementation' to be injected as an argument, rather than automatically choosing based on the bundler config. Then users would be free to choose the implementation they need without relying on bundlers which don't seem to support this scenario well.

I don't think this would be very feasible because:

  1. For it to be useful there couldn't be a default (default to react-native would cause bloat on browsers, default to document would continue to break webworkers), it would need to be a required option passed by everyone
  2. This package is primarily used as a transitive dependency, it goes through mdast utils, up to remark. At every layer we would need to make this a required parameter, that seems like a really poor experience.
stevemk14ebr commented 2 years ago

First, let me say that I understand the constraints involved and the work on everything in the remark ecosystem is very very much appreciated and you have my love <3.

I do understand the apprehension to changing how it works now. But the current situation is quite poor too. As I mentioned earlier, not all users can configure their bundler, create-react-app being a very popular example of this.

The current way this library is designed makes sense! Unfortunately though, it seems the bundlers do not handle this well today. It would be nice if this was better supported in the future to do it this way but by default this results in a broken experience with how it is now, and waiting for that situation to improve is well...webpack is big...that would take a while.

At every layer we would need to make this a required parameter, that seems like a really poor experience.

With the current state you have to manually edit your bundler configuration if you use any remark dependency that transitively includes this, so it's not even obvious when you have to edit your webpack config because of this transitive-ness. I argue being surprised by this is worse than a required argument.

So, I ask that it can at least be considered to change how this works at the moment. There very well may be more options than I suggested, just anything not relying on the bundler configutation would be very nice

wooorm commented 2 years ago

I'm sorry I do not believe the suggestion of implementing a custom 'web-worker' field will work.

As Christian points out, you can configure this. It seems to work in webpack, if it doesn’t, it’s a bug there. Rollup has mainFields (and conditions): https://github.com/rollup/plugins/tree/master/packages/node-resolve#mainfields, esbuild has mainFields (and conditions) https://esbuild.github.io/api/#main-fields.

Add a fallback code path as @yohcop suggests

I’m not interested in that, it adds a ton of code for every user who uses this package in browsers, which is many users, and the solution — configuring webpack when bundling for web workers — is not bad.

Make this feature optional for other remark libraries, I personally have no use for html entities like this [...]

Character references are required by CommonMark, the goals of micromark and remark and the rest of the ecosystem is to match that.

Allow the 'implementation' to be injected as an argument, rather than automatically choosing based on the bundler config. Then users would be free to choose the implementation they need without relying on bundlers which don't seem to support this scenario well.

This is exactly what this browser field and webpack/rollup/esbuild configuration are allowing.

P.S could this be opened? It's closed atm and still very much an issue

It’s an issue with webpack. It’s choosing to pick browser for web workers.


not all users can configure their bundler, create-react-app being a very popular example of this.

What does CRA do for web workers out of the box? 🤔

it seems the bundlers do not handle this well today

Specifically, webpack — rest seems fine? Note that workers are a somewhat niche thing that no bundler does out of the box?

waiting for that situation to improve is well

It’s solved, as Christian points out! You can configure webpack!

With the current state you have to manually edit your bundler configuration if you use any remark dependency that transitively includes this [...]

Only in a web worker

wooorm commented 2 years ago

The "browser" field spec explicitly calls out dom manipulation and such btw: https://github.com/defunctzombie/package-browser-field-spec#overview. Mostly pointing this out to verify my thought process; I really don't see a problem with packages, when they expect the browser field, that those have dom APIs available.

stevemk14ebr commented 2 years ago

Here is the webpack fix...this took a while to find:

webpack.config.js before the plugins array:

aliasFields: [
        "react-native"
      ],
      plugins: [

when used with worker-loader (https://github.com/webpack-contrib/worker-loader):

module: {
      strictExportPresence: true,
      rules: [
        {
          test: /\.worker\.js$/,
          use: { loader: "worker-loader" },
        },

It’s an issue with webpack. It’s choosing to pick browser for web workers.

true. But fighting webpack, or waiting for their opinion to change on this is not fun for users. That may never happen, or take a long time, and until then users must always remember to edit their config manually. And as I mentioned earlier, since this dependency is transitive it's not clear which top-level remark libraries will need this.

What does CRA do for web workers out of the box? 🤔

webworkers don't work well out of the box with CRA, it sucks. There's a plugin you can use that will edit the webpack config transparently and works without eject though (supposedly). So this is a scenario some users could be in.

Only in a web worker

Please see: https://github.com/remarkjs/react-markdown/issues/289#issuecomment-894575846 for why I think the webworker use case/story is important. It is essentially necessary to use remark in a worker when parsing large documents, assuming you want to re-parse a 'preview' view after every keystroke.

I understand your POV. I politely disagree with this being a good way to handle this. If this way is going to stay, it could at least be slightly better by renaming 'react-native' to 'webworker' or adding a second entry as you originally suggested, since it is confusing using 'react-native' bundler for a web application.

ChristianMurphy commented 2 years ago

But fighting webpack, or waiting for their opinion to change on this

Looking through the webpack issue tracker I haven't seen the issue of browser field with webworker having been raised https://github.com/webpack/webpack/issues?q=is%3Aissue+webworker+is%3Aopen :thinking: You should at least give the maintainers webpack a chance to discuss and respond before saying you are "waiting for their opinion to change" much less it being a "fight" :sweat_smile:

until then users must always remember to edit their config manually

Right, I could see this being a recipe at https://unifiedjs.com/learn to help point folks in the right direction

There's a plugin you can use that will edit the webpack config transparently and works without eject though (supposedly). So this is a scenario some users could be in.

https://github.com/gsoft-inc/craco and https://github.com/jhnns/rewire are two example of this, both can be used to modify the config without ejecting

why I think the webworker use case/story is important

This is a strawman, there's no disagreement that webworkers have value, and there is interest in supporting them. The point being made is that we can support web workers today through configuration, without breaking the broader community (which is what most of the proposed alternatives would do). In addition webpack has the option to make the experience more seemless going forward.

stevemk14ebr commented 2 years ago

I have opened a webpack isssue. For now my situation is solved, excited to see how this use case improves over time. I understand it is a little niche in comparison.

stevemk14ebr commented 2 years ago

edit: vite / rollup solution:

  resolve: {
      conditions: ['worker']
  }

Which selects to use the worker condition of https://github.com/wooorm/decode-named-character-reference/blob/54b47da2ced6d5c8b8d6b8b7e60baf4d494c23d8/package.json#L36

dan-lee commented 2 years ago

@stevemk14ebr thanks, this works for me on Cloudflare Workers with esbuild as bundler 👍