rehypejs / rehype-minify

plugins to minify HTML
https://unifiedjs.com
MIT License
90 stars 16 forks source link

rehype-minify-whitespace: inline HTML content is trimmed #19

Closed erquhart closed 4 years ago

erquhart commented 7 years ago

The contents of inline tags are being trimmed. When parsing html to markdown:

Input

a b<span> </span>c d</span><strong> e</strong>

Expected output

a b c d **e**

Actual output

a bc d**e**

Runkit repro: https://runkit.com/erquhart/595d36c4e60f6b0012c00405

wooorm commented 7 years ago

If someone wants to work on it, it’s in rehype-minify-whitespace.

Moved from https://github.com/wooorm/remark/issues/275.

erquhart commented 7 years ago

@wooorm I'm not using rehype-minify-whitespace in that repro - are you certain that's the problem?

wooorm commented 7 years ago

Yup

erquhart commented 7 years ago

@wooorm seems like the best approach may be to simply allow whitespace minification to be toggled off in the options for rehype-remark/hast-util-to-mdast. Would you accept a PR for that?

wooorm commented 7 years ago

Just dropping this here ’cause it probably has to be something like this: https://github.com/wooorm/remark-lint/blob/master/packages/remark-lint-no-paragraph-content-indent/index.js

erezrokah commented 4 years ago

Bumping this as we're still experiencing this issue: https://github.com/netlify/netlify-cms/issues/3727

I added a test that is failing https://github.com/erezrokah/rehype-minify/commit/d7cc409c35e9d91c4f0d569f107e1122d3edb082#diff-c7e2175747a03e137c1615aecfaa1636R103 but not sure how to go about fixing it.

Can you suggest a place to get started? Tried looking at the history of https://github.com/wooorm/remark-lint/blob/master/packages/remark-lint-no-paragraph-content-indent/index.js but couldn't figure out the suggested direction.

kptdobe commented 4 years ago

I am facing the same issue. It is coming from those lines:

https://github.com/rehypejs/rehype-minify/blob/master/packages/rehype-minify-whitespace/index.js#L62-L68

I think the problem is that in the case of a <span>, <em>, <strong>, the siblings you want to test are not the text node siblings (almost always non existing) but the siblings of the parent (the span, em, strong siblings).

kptdobe commented 4 years ago

Re-thinking about it, I would even argue you should never trim an inline element: there is no way you can know if the leading / trailing space has been introduced on purpose or not:

wooorm commented 4 years ago

Released!

erezrokah commented 4 years ago

This is amazing @wooorm, do you suggest I open PRs for https://github.com/syntax-tree/hast-util-to-mdast/blob/7c4667743dd056b33f1c76a7c319674aef951a4f/package.json#L40 and https://github.com/rehypejs/rehype-remark/blob/10c28f586be34b1baf3d221d8116a7e70aed2365/package.json#L28 to update dependencies? Would a major version bump will be required for those?

wooorm commented 4 years ago

It's a patch, so reinstalling dependencies should do the trick!

erezrokah commented 4 years ago

Sorry, let me clarify my question. We're using rehype-remark. Even if I update it to the latest version is still uses "hast-util-to-mdast": "^7.0.0" which uses "rehype-minify-whitespace": "^3.0.0", so it won't get the fix.

I'm using (with yarn) the following workaround:

"resolutions": {
  "rehype-minify-whitespace": "^4.0.2"
}

But that is not ideal

Update

My workaround doesn't work - investigating

kptdobe commented 4 years ago

Awesome! Thanks a lot @wooorm.

I tried to test the changes (I manually patched rehype-remark) and I get an error:

TypeError: Cannot read property 'type' of undefined
    at one .../hast-util-to-mdast/lib/one.js:13:12)

I traced to https://github.com/syntax-tree/hast-util-to-mdast/blob/master/index.js#L31 : minify returns undefined for my tree (was working before). Investigating too :)

kptdobe commented 4 years ago

@wooorm You removed the return tree :)

https://github.com/rehypejs/rehype-minify/commit/64c6206f2ac5c25a4465966a6e29ac696ed63f8b#diff-a6b748a472d9e6a0024e05443221d898L44

Adding it here fix the issue: https://github.com/rehypejs/rehype-minify/blob/master/packages/rehype-minify-whitespace/index.js#L40

kptdobe commented 4 years ago

Created a PR to address this new issue: https://github.com/rehypejs/rehype-minify/pull/33 Feel free to use it or not ;)

I tested the new trimming behaviors, works pretty well, especially with "my" complex DOM. Thanks again!