syntax-tree / hastscript

utility to create hast trees
https://unifiedjs.com
MIT License
164 stars 13 forks source link

Return narrower Element types #23

Open Caellian opened 5 days ago

Caellian commented 5 days ago

Initial checklist

Problem

Assuming there exists some type definition for:

/**
 * @typedef {object} HASTImgElement
 * @property {"element"} type
 * @property {"img"} tagName
 * ... etc ...
 */

The h function currently returns Element which doesn't satisfy tagName restriction. This requires manual casting and/or ts-ignore in TS or JS code with (TS backed) type checking. It would be nice if the function returned {tagName: selector} & Element, even though that still doesn't guarantee properties and other requirements to match.

Current solutions

While toying around with it locally, I saw that this is the ~best that can be done given that selectors can be composed tag expressions with classes and what not. So currently, the type is always Element because of that.

Proposed solutions

Soon after, I realized that in (my) case where tag name is a simple lowercase HTML tag, there could exist a type HTMLName which covers all of them and an additional @override for returned h function which specializes return type based on stricter HTMLName selector type to satisfy:

/**
 * @template T
 * @param {T & HTMLName} selector
 * @param {Properties} properties
 * @param {...Child} children
 * @returns {{tagName: T} & Element}
 */

Now, I checked into used hast-util-parse-selector and it seems to return proper types, so it's hastscript that erases this information.

ChristianMurphy commented 4 days ago

@Caellian could you zoom out a bit. It doesn't meet the restrictions where? And trying to do what?

Caellian commented 4 days ago

I have a rehype plugin that turns <script> tags and contents into a custom <dynamic-script> element. It does some modifications, parsing, etc. and this new element has structure and metadata that allows articles in my blog to have scripts that are dynamically evaluated. (example, things with checkmark).

I enabled typescript on the preprocessor to iron out quirks, and wanted to specify type that describes structure of this new element:

```js /** * @typedef {object} DynamicCodeData * @property {boolean} [noCodeblock] * @property {{[marker: string]: boolean}} [markers] */ /** * @typedef {object} DynamicCodeElement * @property {"element"} type * @property {"code"} tagName * @property {DynamicCodeData & ElementData} [data] * @property {{className: ["language-js"]}} properties * @property {[Text]} children */ /** * @typedef {object} CollapsedCodeElement * @property {"element"} type * @property {"details"} tagName * @property {[ * {tagName: "summary"}, * {tagName: "pre", children: [ DynamicCodeElement ]} * ]} children */ /** * @typedef {{ * "data-exports"?: string[] | undefined, * "data-deferred"?: boolean | undefined, * "data-module"?: boolean | undefined, * }} DynamicScriptProperties */ /** * @typedef {object} DynamicScriptElement * @property {"element"} type * @property {"dynamic-script"} tagName * @property {DynamicScriptProperties & Properties} properties * @property {[DynamicCodeElement | CollapsedCodeElement] & ElementContent[]} children */ ```

However, I can't assign results of h into these as TypeScript complains that, for instance:

h("dynamic-script", {
  "data-module": true
}, dynamicCodeElement)

can't be made into DynamicScriptElement - because:

This forces me to cast as unknown as DynamicScriptElement, but as the file is JS, that means I can only annotate the statement with @ts-ignore. Both are effectively the same.

This is the result of type widening in TypeScript generics. Generally, some types are erased into their wider representations because generally that's the expected behavior.

However, as hast deals with ast, the opposite is the case and narrowest type definitions should be preserved to allow users to specify document schema in type signatures.

I ended up writing my own version without support for selectors:

```ts // https://github.com/DefinitelyTyped/DefinitelyTyped/discussions/71296 type Data = HASTElementData | Record; type Element = HASTElement & { data?: Data | undefined }; type ElementContent = HASTElementContent | Element; /** * Single text child node. */ type SingleText = [{ value: C } & Text] & ElementContent[]; /** * Type of children that avoids widening by making child type a generic. */ type Children = readonly [...C] & ElementContent[]; interface HElementFn { (tagName: T): { tagName: T } & Element; ( tagName: T, properties: P ): { tagName: T; properties: P } & Element; , TC extends ElementContent[]>( tagName: T, children: C ): { tagName: T; children: C } & Element; ( tagName: T, innerText: C ): { tagName: T; children: SingleText; } & Element; < T extends string, const P extends Properties, const C extends Children, TC extends ElementContent[], >( tagName: T, properties: P, children: C ): { tagName: T; properties: P; children: C } & Element; ( tagName: T, properties: P, innerText: C ): { tagName: T; properties: P; children: SingleText } & Element; ( tagName: T, properties: P, data: D ): { tagName: T; properties: P; data: D } & Element; < T extends string, const C extends Children, const D extends Data, TC extends ElementContent[], >( tagName: T, children: C, data: D ): { tagName: T; children: C; data: D } & Element; ( tagName: T, innerText: C, data: D ): { tagName: T; children: SingleText; data: D } & Element; < T extends string, const P extends Properties, const C extends Children, const D extends Data, TC extends ElementContent[], >( tagName: T, properties: P, children: C, data: D ): { tagName: T; properties: P; children: C; data: D } & Element; < T extends string, const P extends Properties, const C extends string, const D extends Data, >( tagName: T, properties: P, innerText: C, data: D ): { tagName: T; properties: P; children: SingleText; data: D; } & Element; } /** * A type safe version of `hastscript` `h` function. Doesn't work with arbitrary * selectors. */ export const hElement: HElementFn = ((tagName: string, ...args: any) => { const result = { type: "element", tagName, }; let order = [ { key: "properties", check: {}, default: {}, }, { key: "children", check: [], default: [], }, { key: "data", check: {}, default: undefined, }, ]; for (let i = 0; i < args.length; i++) { const arg = args[i]; let current = order.shift(); if (current == null) { throw new TypeError(`invalid argument #${i + 1} provided: ${arg}`); } if ( typeof arg === typeof current.check && Array.isArray(arg) === Array.isArray(current.check) ) { result[current.key] = arg; } else if (current.key === "children" && typeof arg === "string") { result[current.key] = [hText(arg)]; } else { result[current.key] = current.default; i--; } } return result; }) as undefined as HElementFn; export function hText(value: string): Text { return { type: "text", value }; } ```

I didn't exactly follow h argument order (it expects data before children). But this code could be more-or-less used in index.d.ts instead of currently generated implementation to produce expected results. As I previously mentioned hast-util-parse-selector has ExtractTagName definition already, In similar fashion, adapter projections that extract className and child elements could be written.

I'm not very proficient in TS, but it seems doable and would allow end-users to describe AST structure in types as I initially wanted.

wooorm commented 4 days ago

This requires manual casting in TypeScript and ts-ignore in JS code

You can use an if-statement: if (node.tagName === 'img') { /* Do things with images */ }.

It would be nice if the function returned {tagName: selector} & Element

More complex types have some benefits. They also have some downsides. It’s a trade off. To choose between trade offs, we’d need arguments. So: what are your arguments?

even though that still doesn't guarantee properties and other requirements to match.

Indeed, it’s not that useful? So, why do you want this?

Now, I checked into used hast-util-parse-selector and it seems to return proper types, so it's hastscript that erases this information.

Right, things could be possible, but the types here are rather complex already. So, to choose whether more complexity is nice, I’d worry about the arguments


However, I can't assign results of h into these as TypeScript complains that, for instance:

Don’t. Use the types we provide. Use @types/hast. The code you show of all your type definitions, you don’t need it, you can use the types we provide!

Our types do not match your types. That will not change with this one change.

as the file is JS, that means I can only annotate the statement with @ts-ignore. Both are effectively the same.

You can do that in JS too. You can do type casts in JS!

Caellian commented 4 days ago

even though that still doesn't guarantee properties and other requirements to match.

Indeed, it’s not that useful? So, why do you want this?

I wasn't clear, it tells TS "it's this, or any other Element". This completely removes the need for all Node property related checks if the information is known at compile time.

You can use an if-statement: if (node.tagName === 'img') { / Do things with images / }.

While this is an option - why? Wouldn't a design that moves type checking into compile time be better? It doesn't impose any additional requirements for users using Element type.

However, I can't assign results of h into these as TypeScript complains that, for instance:

Don’t. Use the types we provide. Use @types/hast. The code you show of all your type definitions, you don’t need it, you can use the types we provide!

Hmmm... I did inline casting incorrectly so it didn't work. Anyway, now that I've figured it out, to illustrate, here's a diff:

```diff --- rehype-dynamicScripts-suggestion.js +++ rehype-dynamicScripts-hastscript.js @@ -197,17 +163,17 @@ })() ); - let href = rebasePath(source.properties?.src, options.targetLocation || "/"); + let href = rebasePath(/** @type {string} */ (source.properties?.src), options.targetLocation || "/"); let note = "remote JS"; if (options.isModule) { note = "remote ESM"; } # these are mostly the same, ignore differences here, there would be none if implemented - target.children.push( - hEl("span", { className: "status" }, note), - hEl("a", { className: "path", href }, href) - ); + target.children.push( + h("span.status", note), + h("a.path", { href }, href) + ); return null; } @@ -250,40 +216,26 @@ options.isModule ); - /** @type {DynamicCodeElement} */ - const exec = hEl("code", { - className: ["language-js"], - }, code, { - noCodeblock: true, - } - ); - /** @type {CollapsedCodeElement} */ - let detailsEl = hEl("details", [ - hEl("summary", "source"), - hEl("pre", [exec]), - ]); - - if (source.properties?.className?.includes("show")) { - /** @type {DynamicCodeElement} */ - let codeEl = hEl("code", { className: ["language-js"] }, code); + const exec = h("code.language-js"); + exec.data = { + // @ts-ignore NOT IN ElementData + noCodeblock: true, + }; + let detailsEl = h("details", [ + h("summary", "source"), + h("pre", [exec]) + ]); + + let className = /** @type {string[]?} */ (source.properties?.className); + if (className?.includes("show")) { + let codeEl = h("code.language-js", code); if (options.deferred) { codeEl.data = { + // @ts-ignore NOT IN ElementData markers: { deferred: true, }, }; } @@ -331,11 +283,7 @@ let deferred = el.properties.defer == true; let isModule = el.properties.type === "module"; # the only part that's worse off, but that's bc of my design decisions - /** @type {DynamicScriptElement} */ - // @ts-ignore the type will be valid once handler is called - const target = hEl("dynamic-script", { + const target = h("dynamic-script", { "data-deferred": deferred ? true : undefined, "data-module": isModule ? true : undefined, }); @@ -361,8 +309,14 @@ } let child = target.children[0]; # example of consumption differences really drives the point home. # often the case when elements are generated in multiple passes over ast - if (child.tagName == "code" && child.data.markers["data-deferred"]) { - // Special handling + if ( + child.type == "element" && + child.tagName == "code" + ) { + let data = /** @type {Record} */ (child.data); + if (data.markers["data-deferred"]) { + // Special handling + } } parent.children.splice(i, 1, target); ```

Ignore that variable declaration types take up additional lines (in TS they'd be inline), and ignore that my hEl is way less ergonomic (more verbose) than hastscript h. The difference is that end-user doesn't need to specify @type all over the place and @ts-ignore in data blocks.

Yes, declaring schema in JSDoc adds verbosity, but that's part of intentional design. What's important to note is that all those type casts circumvent TS and aren't type checked/safe. The only way to guarantee correctness is having large if statements that pin down requirements, but that adds runtime overhead.

Right, things could be possible, but the types here are rather complex already. So, to choose whether more complexity is nice, I’d worry about the arguments

I don't like this either, but IMO I'd prefer hastscript to deal with that than rolling my own :smile: . The types aren't really that complex, just repetitive because of all the different invocations (and I added Data). It's pretty concise without Data and my implementation:

```ts // index.d.ts type Properties = import('hast').Properties; // https://github.com/DefinitelyTyped/DefinitelyTyped/discussions/71296 type Data = import("hast").ElementData | Record; type HASTElement = import("hast").Element & { data?: Data | undefined }; type ElementContent = import("hast").ElementContent | Element; /** * Single text child node. */ type SingleText = [{ value: C } & Text] & ElementContent[]; /** * Type of children that avoids widening by making child type a generic. */ type Children = readonly [...C] & ElementContent[]; interface HElementFn { (tagName: T): { tagName: T } & Element; ( tagName: T, properties: P ): { tagName: T; properties: P } & HASTElement; , TC extends ElementContent[]>( tagName: T, children: C ): { tagName: T; children: C } & HASTElement; ( tagName: T, innerText: C ): { tagName: T; children: SingleText; } & HASTElement; < T extends string, const P extends Properties, const C extends Children, TC extends ElementContent[], >( tagName: T, properties: P, children: C ): { tagName: T; properties: P; children: C } & HASTElement; ( tagName: T, properties: P, innerText: C ): { tagName: T; properties: P; children: SingleText } & HASTElement; } declare const h: HElementFn; ```

In short, narrower types are better IMO because they will cause errors or require checks when users try to access undefined properties, but should also be completely compatible with all existing code.

wooorm commented 2 days ago

We can have a discussion about fancy types, and their trade offs. But before we can have that: you have some rather weird types which can be improved. As a user you should barely need to write @type. There are no errors and there are few checks needed. After that, I doubt that you need this. I use these types a lot and I don’t need what you need.

I’d appreciate it if you ask questions about the problems you run into, instead of asking questions about what you think the solution is (https://xyproblem.info).

Caellian commented 2 days ago

Right, but it's not xyproblem because the diff from my previous comment shows that I solved both X and Y problems.

I'm making this suggestion because IMO this looks nicer:

- let href = rebasePath(/** @type {string} */ (source.properties?.src), options.targetLocation || "/");
+ let href = rebasePath(source.properties?.src, options.targetLocation || "/");

and provides much better experience when one expects certain node types:

-        if (
-          child.type == "element" &&
-          child.tagName == "code"
-        ) {
-          let data = /** @type {Record<string, boolean>} */ (child.data);
-          if (data.markers["data-deferred"]) {
-            // Special handling
-          }
+        if (child.tagName == "code" && child.data.markers["data-deferred"]) {
+          // Special handling
         }

I mean, it's not 100% necessary because in most cases people don't create very complicated ASTs dynamically so Element suffices, but I don't see (m)any reasons why not.

ChristianMurphy commented 2 days ago

While this is an option - why? Wouldn't a design that moves type checking into compile time be better? It doesn't impose any additional requirements for users using Element type.

That is question better posed to you, why? This module is intended to support dynamic node creation, for example JSX. In these circumstances the exact type cannot be know, that is the nature of dynamic content.

If your content is fully static why use this library and add overhead? You could write the static hast, or even better, serve plain HTML with no pre-processing.

but I don't see (m)any reasons why not.

Adding a bunch of type overhead and complexity for a use case that doesn't make sense, makes the experience worse, not better.

Perhaps you have a good reason, but you haven't shared it. Which is why @wooorm is highlighting the XY Problem.

In short, narrower types are better IMO because they will cause errors or require checks when users try to access undefined properties

The existing code can already do this, the if statement @wooorm proposed in https://github.com/syntax-tree/hastscript/issues/23#issuecomment-2508909385 does the exact same thing in terms of narrowing the type. https://unifiedjs.com/learn/recipe/narrow-node-typescript/

Caellian commented 1 day ago

That is question better posed to you, why?

I've already answered, provided diffs, pros and cons. But to summarize:

Pros:

This module is intended to support dynamic node creation, for example JSX. In these circumstances the exact type cannot be know, that is the nature of dynamic content.

You're mistaken, types are rarely any. Sure in most cases data values will be string, or number (which aren't really handled either), but in TS they can also be literal values - ignoring that HTML spec allows arbitrarily named elements, a tagName would be a good example of that.

I'm suggesting that if user provides, for instance, children to hast, then it should return [...children, ...ElementContent[]] instead of simply ElementContent[] because the type of initial children is in fact known and not dynamic. Even using hast types, it could be something like [Element, Text, ...ElementContent[]] and not ElementContent[]. So the current signature drops type information which can be used by TS for validation.

If your content is fully static why use this library and add overhead? You could write the static hast, or even better, serve plain HTML with no pre-processing.

It's not. It just has a schema that's inherits from hast. I am processing markdown, but dealing with components that have specific requirements with invariants that cause runtime crashes in frontend if they're not upheld. I also, explained I'm doing processing on content, but that has nothing to do with the initial suggestion and I believe we're getting sidetracked.

Adding a bunch of type overhead and complexity for a use case that doesn't make sense, makes the experience worse, not better.

What is type overhead? That's why I'm asking you - what exactly would make the experience worse? I have tried using both versions of code and see this as an improvement. That's why I made this suggestion.

Perhaps you have a good reason, but you haven't shared it.

No, I have shared all the necessary details if the initial comment, refer to the Problem section for use case and reasoning.

https://unifiedjs.com/learn/recipe/narrow-node-typescript/

That's runtime checking and doesn't completely utilize TS. Yes, it can infer type if enough runtime checks pin it down, but that overhead can be completely avoided if you simply forward input types back into output.