inikulin / parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
MIT License
3.66k stars 233 forks source link

Hi i would love to refactor this to ESM #307

Closed frank-dspeed closed 3 years ago

frank-dspeed commented 4 years ago

Hello i would love to refactor this to ES2020 and build a CJS version for older NodeJS versions pre v13 if your ok with that simply send me here a msg and i issue a PR Else i will fork Complet as i need this to be ESM and in the browser

https://github.com/inikulin/parse5/pull/308

43081j commented 3 years ago

@inikulin this has to happen IMO, it would be great if you could let us know your preference on how to move forward with it so it can be done and out of the way

even better would be converting to typescript.

this'd also lead to removing those not-so-great export assignments from the old school commonjs too, which would clean up the typescript types a lot (whether they live here or not).

@frank-dspeed do you actively have a branch anywhere which you've started/completed this on?

happy to help if needed

frank-dspeed commented 3 years ago

@43081j i have started it but did not finish it but i am able to do it again in one step as i am specialised on upgrading CJS => ESM i would prefer to upgradae it to ESM only but in a way with JSDOC Anotations so that the types get correctly shown inside every IDE that uses the typescript language server..

I Will start that task now you can simply post a comment here if you want to go directly to typescript as i am not so open to do that.

inikulin commented 3 years ago

Hi folks, frankly speaking haven't followed the JS world for a while. I assume https://github.com/inikulin/parse5/issues/329 is related? Would be happy to accept PRs

43081j commented 3 years ago

Yes exactly.

Worth noting too that the ideal here is to write actual esm or get the build process to output such code. Such that it works in both browsers and node.

It's tempting to fake it by wrapping some cjs in esm, but please don't as it won't work in browsers.

We could either use a bundler to generate an esm bundle or rewrite the modules as proper es modules (export/import instead of require).

wooorm commented 3 years ago

@inikulin Reading through https://twitter.com/sindresorhus/status/1349294527350149121 might be of interest.

FWIW, supporting both ESM and CJS is also complex: I recently published a low-level package, which was used through a couple of CJS packages, but when users bundled higher package with webpack 4, it all broke.

Supporting only ESM is also currently complex, e.g., webpack loaders cannot import yet.

The push that Sindre, me, and others will be doing soon to support only ESM will be a challenge for the ecosystem, but then the dust will settle. So it’s definitely something to think about soon.

43081j commented 3 years ago

while im all for the push to esm-only, it is true it'd be a huge change and pretty abrupt.

i'd recommend rewriting it as ESM but using esbuild or something to output a CJS bundle equivalent (so the packaged individual files are pure ESM, but there is a bundle alongside which is CJS).

that isn't so complex and is what I and many others in the web side of things currently do.

even better would be typescript so we can stop maintaining the types at DT. but thats a bigger job especially from the current state (lots of stuff typescript wouldn't be happy about).

frank-dspeed commented 3 years ago

@43081j @wooorm i also regonized all that and my final Way to go will be a ESM only version this can then get transpiled via babel to cjs or with anything else.

Then the ESM version gets released as new Major version and can get used with current NodeJS via import() inside CJS

the README will show instructions to create a CJS version if needed. And it will point out that this new Major Version is Only ESM and to use it you will need NodeJS Version > 9.7.x or the npm esm package.

What do you think about it?

NodeJS ESM Support

43081j commented 3 years ago

that makes sense to me, and i would argue it is not complex at all to support CJS and ESM:

esbuild lib/index.js --platform=node --bundle --outfile=parse5.bundle.js

as long as the sources are ESM, all is fine. parse5 has no dependencies IIRC, so its pretty straight forward.

frank-dspeed commented 3 years ago

@43081j its not so easy for the engine to figure out the right version and it would add bloat thats not needed as explained before its a edge case to transpile it to CJS

43081j commented 3 years ago

package exports:

  "exports": {
    "import": "./lib/index.js",
    "require": "./parse.cjs"
  }

the bloat of having to duplicate the code into a bundle is inevitable.

then one day when people finally progress to the standard (ESM, which will be a slow job since the node team don't see it as a "progression"), parse.cjs can be dropped.

frank-dspeed commented 3 years ago

@43081j i do not agree i think we should force the users to upgrade anyway as latest nodejs version is 15 and lts is 14 so we are at the point where that is desire able to force them to use this as esm package even inside CJS via import()

43081j commented 3 years ago

i would love that to be the solution, the sooner people leave CJS behind, the better.

guess its @inikulin call to make, whether you make an ESM-only new major version or you support both (also in a major version tbf). people can always install the old one if they must, its not like the feature set will be changing anytime soon.

elgs commented 3 years ago

I hope we could import parse5 directly without webpack. This also will enable parse5 in the Deno world. That will be huge.

RReverser commented 3 years ago

I hope we could import parse5 directly without webpack. This also will enable parse5 in the Deno world. That will be huge.

Ah @inikulin said, no need to hope - feel free to make a PR :)

43081j commented 3 years ago

if we go ESM-only, no bundler will be needed. and even if we choose to support older node via cjs, we should use a more focused tool like esbuild.

@frank-dspeed are you still going to tackle this? ESM-only, with type: "module" in the package manifest too (to go all in).

if inikulin later decides he wants to support cjs, we can quite easily esbuild what you end up with into a cjs bundle.

elgs commented 3 years ago

I hope we could import parse5 directly without webpack. This also will enable parse5 in the Deno world. That will be huge.

Ah @inikulin said, no need to hope - feel free to make a PR :)

In my opinion, this needs to be a new project. The require in Nodejs and the import elsewhere are naturally incompatible. I've seen several projects releasing a separate package for the ES, like lodash.

If @inikulin is not going to engage in this transition, I will probably spend some time learning the source code from this repo and make a new one.

I am the author of https://github.com/elgs/leanweb, and my project is heavily depending on Parse5 (Thank you @inikulin). My incentive is to get rid of webpack the monster. With ES, our code should run in browsers directly without webpack.

RReverser commented 3 years ago

In my opinion, this needs to be a new project.

Why would it possibly need to be a new project?

require in Nodejs and the import elsewhere are naturally incompatible.

Node.js supports ESM nowadays, you don't need require anymore. It will just require a major version bump, but that's it - once moved to imports, they can be used both on latest Node and in browsers.

In terms of changes, it's a fairly simple 1:1 translation too.

elgs commented 3 years ago

@RReverser ok, working on parse5-es under the packages directory now. Hopefully a PR will be made in the next few hours.

RReverser commented 3 years ago

lodash-es was an artifact of the time as it appeared when ESM wasn't available in most environments. Nowadays such split between packages is not really necessary - you can upgrade to imports in existing packages.

43081j commented 3 years ago

Agreed. Keep it in this project, bump the major version, go all in on ESM. People can always install the old one if they really need CJS.

@elgs presumably you mean opening a PR here? And not "parse5-es". Replace the existing package, versions exist for this.

Keep in mind Frank above wanted to tackle it too. Worth seeing what he's already done, if anything

elgs commented 3 years ago

@43081j yes, initially I made a parse5-es, but I found the lerna was not happy. So I made everything in its original place. Yesterday I've changed all requires to imports, and all exportses to exports or export defaults. What I haven't done are the tests. I will get the tests done in the weekend. I am getting a little excited now.

43081j commented 3 years ago

awesome work in that case :D soon as you can get a draft PR up , would be happy to help review

elgs commented 3 years ago

https://github.com/elgs/parse5

I apologize I found it not practical to make a PR. But I ported the code and made a separate project. My main goal is to run parse5 in browsers and Deno, therefore I need to get rid of all Node API related code.

Reasons I found impractical to to make a PR are:

  1. the test tool chains are heavily couple with Node API;
  2. all other modules than parse5 and parse5-htmlparser2-tree-adapter are Node API related;

Therefore I ditched the modules that use Node's stream API and kept only parse5 and parse5-htmlparser2-tree-adapter which are written in pure Javascript.

This project doesn't change any logic from inikulin/parse5 v6.0.1. All the port work is done is changing requires, exportses to imports and exports.

Any review will be appreciated.

43081j commented 3 years ago

Inevitably it needs to end up in a pr here either way. It's fine for the node-dependant packages to continue depending on node, as you presumably wouldn't be using them anyway.

The testing can also be reworked if needed to use a combination like mocha and uvu so we don't need any cjs dependencies.

Can you let us know the problems/specifics you ran into that are blocking you from doing a pr? And maybe we can help figure it out

I'll have a read through the fork when I can though too

elgs commented 3 years ago

Can you let us know the problems/specifics you ran into that are blocking you from doing a pr? And maybe we can help figure it out

@43081j when I added "type": "module" in the package.json, npm run test wouldn't work. I feel node and its ecosystem are just not ready for es yet. Fortunately @inikulin made the parse5 package cleanly independent from any node api, so I was able to easily port it over. Not compatible with the es standard is the sin of node and I don't see any remedy. But it's not node's fault as when node was born, everything was not clearly yet. What I could foresee is the slowly dying of node in the next decade, just like Java.

The real evil thing is webpack allowing fake import syntax for cjs modules. Effectively that is forcing nowadays every web project to go through a build process by webpack. This is painfully evil. But to be fair, we could say the sin is caused by cjs not becoming the standard for the browsers.

I believe the pain is temporary, as more and more node modules are available in their es form, who will be willing to go through the cjs to browser hassle. That's why I am happy to see any packages to be node api free. So web developers will be able to run their own code in browsers again, without being kidnapped by webpack.

elgs commented 3 years ago

Inevitably it needs to end up in a pr here either way. It's fine for the node-dependant packages to continue depending on node, as you presumably wouldn't be using them anyway.

@43081j now I realized probably not. parse5 has several sub packages, and I found the main package with the same name parse5 is sufficient for the purposes what this project serves. The other sub packages like sax, stream are more like the icing on the cake in my opinion.

So I don't see any proper thing we can do with the node apis like stream and all testing toolchains used in this project, all due to the incompatibility between the cjs and es.

elgs commented 3 years ago

@43081j please feel free to port the code back to this repo. All I changed is the import/export, and nothing else is changed compared to v6.0.1. I found I am not capable enough to mess with the test tool chains in node.

RReverser commented 3 years ago

I feel node and its ecosystem are just not ready for es yet.

It is ready and stabilized. It just requires work to port existing code over like you did with one of the packages. If you don't want to make a PR, that's fine - I'm sure someone else on the thread will pick it up.

RReverser commented 3 years ago

the test tool chains are heavily couple with Node API;

FWIW this is also not an issue - you just need to change tests to use import as well. You can use import with Node APIs, so I don't fully understand the argument / why Node APIs are an issue. Do you just mean for running tests in Deno? You can still leave tests running in Node, but actually use the package from either runner.

elgs commented 3 years ago

@RReverser I am fine with leaving the tests in node. What I am after is only the actual code to strictly follow the standards and be future proof. I attempted to change to imports in the tests, but I got all kinds of errors from the testing toolchains. Now I am instinctively against node because I don't see a future for node, so unless it's an easy change, I don't want to spend too much time on node.

RReverser commented 3 years ago

Now I am instinctively against node because I don't see a future for node

Yeah that was quite noticeable from your comments tbh.

43081j commented 3 years ago

I'll have a look this week.

Will see if I can open a draft pr and build on where you got to 👍

I do also have an unfinished branch where I got a fraction through converting it to typescript. Though that's an even bigger job but will probably submit it some day because why not.

FYI the tests can be updated to use import, yes, but will only work in node. Better to move to an assertion library which is platform agnostic so they can be run in node and in the browser. But we will see

RReverser commented 3 years ago

Better to move to an assertion library which is platform agnostic so they can be run in node and in the browser. But we will see

There is no browser-specific code in parse5, so there's little point in running tests in browser - it's not like it's a UI library or doing anything with any other browser APIs. It's pure JS (not counting packages that are Node-specific - for those running tests in browser obviously makes even less sense).

43081j commented 3 years ago

Arguable. Just because it isn't a browser specific library doesn't give us an excuse to not test it in one. It's not a node specific library either technically.

The point is to test it in the platform it is expected to work on, which going forward could be browsers as well as node if we are smart about it. Easy enough to run the tests in node and in a browser if we simply don't tie ourselves into the node API. Easy win

It's really not a difficult thing to achieve so let's just see how the pr ends up and feed back in there

RReverser commented 3 years ago

Just because it isn't a browser specific library doesn't give us an excuse to not test it in one. It's not a node specific library either technically.

Well, exactly because it's not specific to any runner, it's sufficient to run with any spec-compliant JS engine. So far it's been Node because at least some of the tests are Node-specific, and there's no practical reason to change it or spend CI time on more platforms. If some runner can't execute JS correctly, it's runner's bug, not parse5 anyway :)

43081j commented 3 years ago

Node's implementation of esm is not "pure". The interop means there's plenty of edge cases and special behaviours which don't exist in browsers. Testing code in node isn't necessarily a sign it'll work fine in browsers and vice versa.

In this case we probably don't use any web APIs or node APIs so it isn't so important. But like I said, it's such a tiny change to let them be run in both places I don't think it needs debate. If I get around to doing the pr we will see how it goes

frank-dspeed commented 3 years ago

I am at present working on a new version and it was needed to refactor the whole project to make it static analyze able.

I Agree that it is near a Complet Rewrite at present i only Keeped the API Alive to be as Compatible as Possible

I Needed to remove a lot of stuff and still not finished that process

List of stuff that got refactored and removed

was not able to run original tests i am not sure why that happens and i do not care maybe i recode them also but i will issue a PR here and release it as a Extra Package as chacnes are high that @inikulin does not want all that changes but i think they are needed.