taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

[Feature] Add / improve ESM support #139

Closed hornta closed 2 years ago

hornta commented 3 years ago

I believe the README is wrong. It says to import the named export parse but it can't work because the file is CommonJS.

import { parse } from "node-html-parser";
         ^^^^^
SyntaxError: Named export 'parse' not found. The requested module 'node-html-parser' is a CommonJS module, which may not support all module.exports as named exports.
CommonJS modules can always be imported via the default export, for example using:

import pkg from 'node-html-parser';
const { parse } = pkg;
nowaythatworked commented 3 years ago

I just had the same error. After a lot of trial & error I found following import which seems to work. import { parse, default as HTMLElement } from "node-html-parser/dist/nodes/html.js";

nonara commented 3 years ago

Can you please share your tsconfig.json and package.json files?

DanKaplanSES commented 3 years ago

I have a typescript project and I am able to do both of these.

@nowaythatworked By the way, default does not equal HTMLElement. I think you probably meant to do this:

import { parse, HTMLElement } ...

nonara commented 2 years ago

Added in v5

import { parse, default as HTMLElement } from "node-html-parser/dist/nodes/html.js";

If you're using this workaround, this will no longer work when you upgrade to the latest major version. You should now be able to directly import from the root, however.

Example: import { parse, HTMLElement } from 'node-html-parser'

Let me know if you have any trouble!

Pomax commented 2 years ago

Hm, so I ran npm i node-html-parser, and package.json reports "node-html-parser": "^5.0.0", but if I run my test code that starts with import { parse } from "node-html-parser"; jest throws an import error:

    SyntaxError: The requested module 'node-html-parser' does not provide an export named 'parse'

      at Runtime.linkAndEvaluateModule (node_modules/jest-runtime/build/index.js:789:5)
      at TestScheduler.scheduleTests (node_modules/@jest/core/build/TestScheduler.js:333:13)
      at runJest (node_modules/@jest/core/build/runJest.js:387:19)
      at _run10000 (node_modules/@jest/core/build/cli/index.js:408:7)
      at runCLI (node_modules/@jest/core/build/cli/index.js:261:3)

Test Suites: 1 failed, 1 total
Tests:       0 total
Snapshots:   0 total
Time:        2.415 s, estimated 8 s
Ran all test suites matching /.\\test\\forms\\forms.test.js/i

Simplifying to the simplest possible Node test, I made a single test.js that only contains that import, and ran it. Node reports the following problem:

$ node test.js

[...]/node_modules/node-html-parser/esm/index.js:1
import nhp from '../dist/index.js'
       ^^^
SyntaxError: The requested module '../dist/index.js' does not provide an export named 'default'
    at ModuleJob._instantiate (node:internal/modules/esm/module_job:124:21)
    at async ModuleJob.run (node:internal/modules/esm/module_job:179:5)
    at async Loader.import (node:internal/modules/esm/loader:178:24)
    at async Object.loadESM (node:internal/process/esm_loader:68:5)
    at async handleMainPromise (node:internal/modules/run_main:63:12)

Which makes sense: if I look at index.js I don't know what your transpiler did, but it sure did something wrong. Instead of clean ESM code with .js extensions added to the imports, it's generated this:

"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
    return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.NodeType = exports.TextNode = exports.Node = exports.valid = exports.default = exports.parse = exports.HTMLElement = exports.CommentNode = void 0;
var comment_1 = require("./nodes/comment");
Object.defineProperty(exports, "CommentNode", { enumerable: true, get: function () { return __importDefault(comment_1).default; } });
var html_1 = require("./nodes/html");
Object.defineProperty(exports, "HTMLElement", { enumerable: true, get: function () { return __importDefault(html_1).default; } });
var parse_1 = require("./parse");
Object.defineProperty(exports, "parse", { enumerable: true, get: function () { return __importDefault(parse_1).default; } });
Object.defineProperty(exports, "default", { enumerable: true, get: function () { return __importDefault(parse_1).default; } });
var valid_1 = require("./valid");
Object.defineProperty(exports, "valid", { enumerable: true, get: function () { return __importDefault(valid_1).default; } });
var node_1 = require("./nodes/node");
Object.defineProperty(exports, "Node", { enumerable: true, get: function () { return __importDefault(node_1).default; } });
var text_1 = require("./nodes/text");
Object.defineProperty(exports, "TextNode", { enumerable: true, get: function () { return __importDefault(text_1).default; } });
var type_1 = require("./nodes/type");
Object.defineProperty(exports, "NodeType", { enumerable: true, get: function () { return __importDefault(type_1).default; } });

That is very much not ESM code =D

nonara commented 2 years ago

We are not compiling a separate build for esm, as that is reportedly not the ideal way to handle it. Doing so can cause issues . Instead, we used a wrapper, which is the best route for making a cjs module work with esm.

See: https://redfin.engineering/node-modules-at-war-why-commonjs-and-es-modules-cant-get-along-9617135eeca1

It works on this end.

Check your package.json file to make sure it has type: module set.

For an example project, see ./test/assets/projects/esm

Pomax commented 2 years ago

It very much does, I've been working on this library for the last 2 months and I haven't worked in CJS for much longer than that =)

Pomax commented 2 years ago

Very interesting. A bare test shows this working (new dir, file with the single import instruction, new package.json with type:module, then executing the file with node), so I'll do a bit of digging to see what the failure MCVE is here.

Pomax commented 2 years ago

Of course. NPM caching. Classic.

Force-cleared npm's cache, and it's working just fine now =D

nonara commented 2 years ago

Of course. NPM caching. Classic: force-cleared npm's cache, and it's working just fine now =D

Tale as old as time 😆 Glad it's working!

swyxio commented 2 years ago

still getting this error for what its worth - with version pinned to 5.2.0, and using SvelteKit (which uses Vite under the hood.. in case that bundling matters)

import {parse as nodehtmlparse} from 'node-html-parser';

//...
nodehtmlparse(content.trim().split('\n')[0])

Error: parse is not a function

mattlehrer commented 2 years ago

@sw-yx I somehow have the same issue today and fixed it with import parse from... (no brackets) as described in this issue

pilcrowOnPaper commented 2 years ago

In SvelteKit, it works fine with npm run dev but I'm getting "parse is not a function" after npm run build.

braebo commented 2 years ago

@pilcrowOnPaper I'm getting the same issue. Named import works in dev but not in build, and default import works in build but not in dev. Never seen this before. Did you ever get it working in SvelteKit?

pilcrowOnPaper commented 2 years ago

@FractalHQ

I gave up and used parse5, though I still prefer using node-html-parser

nonara commented 2 years ago

If you'd like to put together a repro I can clone, I will have a look. I have a gap of free time coming up for the first time in awhile, so I'll be able to look closer.

Pomax commented 2 years ago

@mattlehrer @FractalHQ if you're getting a similar issue, it's not this one: mine was fixed after cleaning cache and reinstalling, so it sounds like you want to file a new issue with more specific STR than I had, so it can be tracked independently of this closed issue =)

(also, so I stop getting notifications about an issue that got closed last October ;)