taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

Ensure ESM support. Fixes #149 #150

Closed AgainPsychoX closed 2 years ago

AgainPsychoX commented 3 years ago

Fixes #149.

Tests included are passed okay (after fixing paths and adding experimental-specifier-resolution=node). I noticed the benchmarks on the README are "a bit" outdated, but it isn't my concern today.

Why currently included tests were passing despite they were using ES modules?

There seems to be files in test folder using ES modules, among them compare-node-html-parser.mjs.

Despite first impression, it wasn't using ES modules version from ./dist/esm/*, but it was using ./dist/index.js; that's why the benchmark was running fine before.

Additional testing

I tested proper the ESM support myself using simple project like:

package.json

{
    "type": "module",
    "scripts": {
        "start": "node --experimental-specifier-resolution=node main.js"
    }
}

(changing type to commonjs (when not added at all, it's also commonjs))

main.js

import { parse } from "node-html-parser";
// const { parse } = require("node-html-parser");
console.log(parse); // [Function: parse]

(I had the node-html-parser linked by npm link for the test)

Results:

Notes

AgainPsychoX commented 3 years ago

I removed the experimental-specifier-resolution=node in place for adding .js to imports in source files (.js in Typescript, sic!) in last commit ( https://github.com/taoqf/node-html-parser/pull/150/commits/18ca5848c25f051a12acab4a69b6b3aca63c1a0c ).

It causes multiple warnings about not linter not being able to resolve the imports, like

warning  Unable to resolve path to module './html.js'  import/no-unresolved

but it also compiles and works well.

I added it as alternative to using the flag, IMO should be reverted (the last commit), but wanted to provide possible outcome.

I don't think there is any good way of making it work without any of those solutions. I was reading online whole morning about using ES modules, and it seems TS devs/JS community still don't know (or still argue) which approach would be the best.

Some of the stuff I found interesting:

Have to say, it's weird stuff Typescript is doing around ES modules. It feels like Typescript is stuck few years ago because holding to compatibly, or some part of community/devs are stubborn.

nonara commented 3 years ago

Interesting stuff! Just a thought, you can use typescript-transform-paths with the @transform-path tag to transform the imports to .js in the output.

Steps:

  1. Install ts-patch as dev dep + add prepare script
  2. Create tsconfig.esm.json
    • Make it extend from the main config
    • Add the plugin config
  3. Add @transform-path comments to the source, specifying the path with a .js extension
  4. Make build process for esm use the new config

I think that would do what you want. I'm still not as knowledgeable about esm, but I'm getting there in having to deal with various support tickets for it in the repos I maintain. If this is a solution, maybe it would make sense to add a config option to typescript-transform-paths to set a common output extension for imports.

AgainPsychoX commented 3 years ago

I knew I could use transformers for that, problem is using ts-patch is like another layer of configuration. Also, using the plugin you suggest require to use @transform-path comments everywhere in source, which isn't much better than adding .js extensions to imports in the source anyway (maybe the warning would be gone, but if it all it does it's just easier to disable the warning then).

I think I will try to explore other options, like maybe custom transformer just to add the .js in the output (i found this https://github.com/Zoltu/typescript-transformer-append-js-extension , but requires updates and testing). All

There is also another issue apart from the .js extension in imports, I forgot to mention. If you import CJS modules from ESM you can't just write:

import { decode, encode } from 'he';

because some exports can't be deduced (there are tools for it, but not perfect; even in this project there is he used with does not work). That's why i modified sources around other modules imports (like in src/nodes/node.ts)


Anyway, ESM with Typescript is messy, no official solutions provided as the standards aren't yet fully created. Yet, having .js in compiled files (targeting ESM) seems to be best strategy for now.

nonara commented 3 years ago

It's probably best to leave the syntax unchanged and use a transformer to convert it on output. Although verbose, even the comment method is less likely to produce any side-effects later. As you mentioned, ESM in TS is very messy. That said, a transformer which added the .js would definitely be nicer!

The logic in the transformer you posted looks sound, however it does seem to be a bit outdated. Does not incorporate factory changes in the compiler API, so the methods it's using are deprecated. It will likely need an update in a few TS versions.

apprat commented 3 years ago

The problem discussed a few years ago, until now has not been resolved, really fuck

taoqf commented 3 years ago

Thanks for all your work, It is a big change. I didn't know much about this. But I think we should take of this carefully. I used some module and did have a good experiences. expacillay add .js to import. I could not navigate to right d.ts file but js file. So I think I will wait and here more voice about this. There is a way to avoid the error using default exports. see https://github.com/taoqf/node-html-parser/issues/149

nonara commented 2 years ago

This should be addressed in the following PR:

Would appreciate feedback to ensure it's working for your use cases @AgainPsychoX @apprat

nonara commented 2 years ago

Corrected in v5