syntax-tree / hast-util-raw

utility to reparse a hast tree
https://unifiedjs.com
MIT License
11 stars 4 forks source link

BUG: newlines are added before tables #26

Closed aarondill closed 3 months ago

aarondill commented 3 months ago

Initial checklist

Affected packages and versions

9.0.4

Link to runnable example

No response

Steps to reproduce

> mkdir repro && cd repro
> npm init
> npm i "hast-util-from-html" "hast-util-raw" "hast-util-to-html"
> editor index.mjs # Paste contents from below
> node index.mjs
// index.mjs
import { fromHtml } from "hast-util-from-html";
import { raw } from "hast-util-raw";
import { toHtml } from "hast-util-to-html";

const contents = `<table>
      <thead>
        <tr><th>Column</th></tr>
      </thead>
      <tbody>
        <tr><td>foo</td></tr>
        <tr><td>bar</td></tr>
      </tbody>
    </table>`;
const hast = fromHtml(contents, { fragment: true });
const reformatted = raw(hast);

console.log("Without util-raw:");
console.log(toHtml(hast));
console.log("With util-raw:");
console.log(toHtml(reformatted));

Expected behavior

hast-util-raw shouldn't add whitespace lines at the start.

This was discussed in rehypejs/rehype-raw#22 and they decided it wasn't a bug since ordinarily browsers parse it the same, however, when using non-default values of the CSS property white-space (such as pre or pre-wrap) this issue appears.

Actual behavior

A text node with whitespace is added:

{
   // ...,
   {
      type: 'text',
      value: '\n      \n        \n      \n      \n        \n        \n      \n    '
    },
    // ...
}

Affected runtime and version

node@22.5.1

Affected package manager and version

pnpm@9.5.0

Affected OS and version

Arch Linux (Rolling)

Build and bundle tools

No response

wooorm commented 3 months ago

22 stands; this project does what HTML does.

however, when using non-default values of the CSS property white-space (such as pre or pre-wrap) this issue appears.

Don‘t.

Or, explain what you are actually doing. https://xyproblem.info

aarondill commented 3 months ago

I'd like newlines the user inputted to be maintained, while also allowing user inputted html to be rendered (yes, I know the security risks). For example:

Input: Hello World Output: Hello World

Input:

Hello

<table>
      <thead>
        <tr><th>Column</th></tr>
      </thead>
      <tbody>
        <tr><td>foo</td></tr>
        <tr><td>bar</td></tr>
      </tbody>
    </table>
World

Output:

Hello

Column
foo
bar

World

wooorm commented 3 months ago

It’s not possible. The information is lost.

If you want to remove unneeded whitespace, use https://github.com/rehypejs/rehype-minify/tree/main/packages/rehype-minify-whitespace. If you want pretty whitespace, use https://github.com/rehypejs/rehype-format.

github-actions[bot] commented 3 months ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.