syntax-tree / hast-util-raw

utility to reparse a hast tree
https://unifiedjs.com
MIT License
11 stars 4 forks source link

Field position is being removed #12

Closed elmasse closed 3 years ago

elmasse commented 3 years ago

Using hast-util-raw directly from this package or thru rehype-raw removes position fields when using a combination of proc.runSync(proc.parse(text))

Initial checklist

Affected packages and versions: v6+

Steps to reproduce

  1. Use unified with remark-parse, remark-rehype and hast-util-raw to create a processor.
  2. Use the processor to create a tree with parse and pass it to runSync.

Here is a failing test case using the same data from the tests suite in the repo:

Link to code example:

When using unified with remark-parse, remark-rehype and hast-util-raw (and rehyper-raw) the position is removed as noted in the next test case. I added a first check to ensure position is present after the transform to rehype.

1..3
# tests 3
# pass  1
# fail  2

I used the data in the current test suite to create a failing case:

test('runSync position removed', function (t) {
  const text = [
    '<i>Some title</i>',
    '',
    '<p>Hello, world!',
    '',
    '*   This',
    '*   Is',
    '*   A',
    '*   List',
    '',
    'A mix of *markdown* and <em>HTML</em>.',
    '',
    '***',
    '',
    '![an image](https://example.com/favicon.ico "title")',
    '',
    '<svg><rect/></svg>',
    '',
    '<div id="foo" class="bar baz"><img src="a" alt=bar>alfred</div>',
    '',
    '<p>Hello, world!',
    ''
  ].join('\n')

  const expected = {
    type: 'root',
    children: [
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'element',
            tagName: 'i',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'Some title',
                position: {
                  start: {line: 1, column: 4, offset: 3},
                  end: {line: 1, column: 14, offset: 13}
                }
              }
            ],
            position: {
              start: {line: 1, column: 1, offset: 0},
              end: {line: 1, column: 18, offset: 17}
            }
          }
        ],
        position: {
          start: {line: 1, column: 1, offset: 0},
          end: {line: 1, column: 18, offset: 17}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'text',
            value: 'Hello, world!\n',
            position: {
              start: {line: 3, column: 4, offset: 22},
              end: null
            }
          }
        ],
        position: {
          start: {line: 3, column: 1, offset: 19},
          end: {line: 5, column: 1, offset: 37}
        }
      },
      {
        type: 'element',
        tagName: 'ul',
        properties: {},
        children: [
          {type: 'text', value: '\n'},
          {
            type: 'element',
            tagName: 'li',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'This',
                position: {
                  start: {line: 5, column: 5, offset: 41},
                  end: {line: 5, column: 9, offset: 45}
                }
              }
            ],
            position: {
              start: {line: 5, column: 1, offset: 37},
              end: {line: 5, column: 9, offset: 45}
            }
          },
          {type: 'text', value: '\n'},
          {
            type: 'element',
            tagName: 'li',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'Is',
                position: {
                  start: {line: 6, column: 5, offset: 50},
                  end: {line: 6, column: 7, offset: 52}
                }
              }
            ],
            position: {
              start: {line: 6, column: 1, offset: 46},
              end: {line: 6, column: 7, offset: 52}
            }
          },
          {type: 'text', value: '\n'},
          {
            type: 'element',
            tagName: 'li',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'A',
                position: {
                  start: {line: 7, column: 5, offset: 57},
                  end: {line: 7, column: 6, offset: 58}
                }
              }
            ],
            position: {
              start: {line: 7, column: 1, offset: 53},
              end: {line: 7, column: 6, offset: 58}
            }
          },
          {type: 'text', value: '\n'},
          {
            type: 'element',
            tagName: 'li',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'List',
                position: {
                  start: {line: 8, column: 5, offset: 63},
                  end: {line: 8, column: 9, offset: 67}
                }
              }
            ],
            position: {
              start: {line: 8, column: 1, offset: 59},
              end: {line: 8, column: 9, offset: 67}
            }
          },
          {type: 'text', value: '\n'}
        ],
        position: {
          start: {line: 5, column: 1, offset: 37},
          end: {line: 8, column: 9, offset: 67}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'text',
            value: 'A mix of ',
            position: {
              start: {line: 10, column: 1, offset: 69},
              end: {line: 10, column: 10, offset: 78}
            }
          },
          {
            type: 'element',
            tagName: 'em',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'markdown',
                position: {
                  start: {line: 10, column: 11, offset: 79},
                  end: {line: 10, column: 19, offset: 87}
                }
              }
            ],
            position: {
              start: {line: 10, column: 10, offset: 78},
              end: {line: 10, column: 20, offset: 88}
            }
          },
          {
            type: 'text',
            value: ' and ',
            position: {
              start: {line: 10, column: 20, offset: 88},
              end: {line: 10, column: 25, offset: 93}
            }
          },
          {
            type: 'element',
            tagName: 'em',
            properties: {},
            children: [
              {
                type: 'text',
                value: 'HTML',
                position: {
                  start: {line: 10, column: 29, offset: 97},
                  end: {line: 10, column: 33, offset: 101}
                }
              }
            ],
            position: {
              start: {line: 10, column: 25, offset: 93},
              end: {line: 10, column: 38, offset: 106}
            }
          },
          {
            type: 'text',
            value: '.',
            position: {
              start: {line: 10, column: 38, offset: 106},
              end: {line: 10, column: 39, offset: 107}
            }
          }
        ],
        position: {
          start: {line: 10, column: 1, offset: 69},
          end: {line: 10, column: 39, offset: 107}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'hr',
        properties: {},
        children: [],
        position: {
          start: {line: 12, column: 1, offset: 109},
          end: {line: 12, column: 4, offset: 112}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'element',
            tagName: 'img',
            properties: {
              src: 'https://example.com/favicon.ico',
              alt: 'an image',
              title: 'title'
            },
            children: [],
            position: {
              start: {line: 14, column: 1, offset: 114},
              end: {line: 14, column: 53, offset: 166}
            }
          }
        ],
        position: {
          start: {line: 14, column: 1, offset: 114},
          end: {line: 14, column: 53, offset: 166}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'element',
            tagName: 'svg',
            properties: {},
            children: [
              {
                type: 'element',
                tagName: 'rect',
                properties: {},
                children: [],
                position: {
                  start: {line: 16, column: 6, offset: 173},
                  end: {line: 16, column: 13, offset: 180}
                }
              }
            ],
            position: {
              start: {line: 16, column: 1, offset: 168},
              end: {line: 16, column: 19, offset: 186}
            }
          }
        ],
        position: {
          start: {line: 16, column: 1, offset: 168},
          end: {line: 16, column: 19, offset: 186}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'div',
        properties: {id: 'foo', className: ['bar', 'baz']},
        children: [
          {
            type: 'element',
            tagName: 'img',
            properties: {src: 'a', alt: 'bar'},
            children: [],
            position: {
              start: {line: 18, column: 31, offset: 218},
              end: {line: 18, column: 52, offset: 239}
            }
          },
          {
            type: 'text',
            value: 'alfred',
            position: {
              start: {line: 18, column: 52, offset: 239},
              end: {line: 18, column: 58, offset: 245}
            }
          }
        ],
        position: {
          start: {line: 18, column: 1, offset: 188},
          end: {line: 18, column: 64, offset: 251}
        }
      },
      {type: 'text', value: '\n'},
      {
        type: 'element',
        tagName: 'p',
        properties: {},
        children: [
          {
            type: 'text',
            value: 'Hello, world!',
            position: {
              start: {line: 20, column: 4, offset: 256},
              end: {line: 20, column: 17, offset: 270}
            }
          }
        ],
        position: {
          start: {line: 20, column: 1, offset: 253},
          end: {line: 20, column: 17, offset: 270}
        }
      }
    ],
    data: {quirksMode: false},
    position: {
      start: {line: 1, column: 1, offset: 0},
      end: {line: 21, column: 1, offset: 270}
    }
  }

  const proc = unified()
    .use(remarkParse)
    .use(remarkRehype, {allowDangerousHtml: true})
    .use(function () {
      return transformer
      function transformer(/** @type {import('unist').Node} */ tree) {
        t.ok(tree.position, 'should have position')
        return tree
      }
    })
    .use(function () {
      // @ts-ignore hast is more specific than unist.
      return (tree, file) => raw(tree, file)
    })
    .use(function () {
      return transformer
      function transformer(/** @type {import('unist').Node} */ tree) {
        t.deepEqual(tree, expected, 'should equal the fixture tree')
      }
    })

  proc.runSync(proc.parse(text))

  t.end()
})

Expected behavior

The position fields should not be removed.

Actual behavior

The position fields are being removed

github-actions[bot] commented 3 years ago

Hi! It seems you removed the template which we require. Here are our templates (pick the one you want to use and click *raw* to see its source):

I won’t send you any further notifications about this, but I’ll keep on updating this comment, and hide it when done!

Thanks, — bb

wooorm commented 3 years ago

I am guessing this is because this utility expects a modern VFile (which uses .value), whereas remark/rehype/unified are still using the previous vfile (.contents). All of vfile/syntax-tree/micromark/wooorm has been updated, but unified/remark/rehype have not.

Here is a minimal example that shows this utility working:

import parse5 from 'parse5'
import {VFile} from 'vfile'
import {fromParse5} from 'hast-util-from-parse5'
import {inspect} from 'unist-util-inspect'
import {raw} from './index.js'

const file = new VFile('<h1>whatever</h1>')
const p5ast = parse5.parse(String(file), {sourceCodeLocationInfo: true})
const tree = fromParse5(p5ast, file)

console.log('1', inspect(tree))

raw(tree, file)

console.log('2', inspect(tree))

Yields:

1 root[1] (1:1-1:18, 0-17)
│ data: {"quirksMode":true}
└─0 element<html>[2]
    │ properties: {}
    ├─0 element<head>[0]
    │     properties: {}
    └─1 element<body>[1]
        │ properties: {}
        └─0 element<h1>[1] (1:1-1:18, 0-17)
            │ properties: {}
            └─0 text "whatever" (1:5-1:13, 4-12)
root[1] (1:1-1:18, 0-17)
│ data: {"quirksMode":true}
└─0 element<html>[2]
    │ properties: {}
    ├─0 element<head>[0]
    │     properties: {}
    └─1 element<body>[1]
        │ properties: {}
        └─0 element<h1>[1] (1:1-1:18, 0-17)
            │ properties: {}
            └─0 text "whatever" (1:5-1:13, 4-12)
github-actions[bot] commented 3 years ago

Hi! Thanks for taking the time to contribute! This has been marked by a maintainer as needing a reproduction: It’s not yet clear whether this is a problem. Here are a couple tips:

Thanks, — bb

elmasse commented 3 years ago

@wooorm thanks for your promptly reply.

However, my intention with this issue might be unrelated to having a vFile. In the integration test case there is no vFile involved, that's why I reproduced the exact same scenario failing.

In my case, I have tested this in https://github.com/elmasse/nextein project using just remark/rehype which uses not the v7 but a previous v6 of hast-util-raw causing the same issue.

On the other hand, I have inspected the vFile passed by my unified processor and makes no difference since is just an empty holder.

My problem arises when I tried to keep up to date with my dependencies and updating to rehype-raw@5.1.0 breaks and, as far as I can tell, it seems to be related to this module.

Let me know if I can provide more information. I could fork and create a test case in any of the repos.

Thanks

Max

wooorm commented 3 years ago

So in rehype-raw@5 it worked, and in v5.1 it stopped?

On the other hand, I have inspected the vFile passed by my unified processor and makes no difference since is just an empty holder.

Do you mean the vfile you pass is empty? As in, you don’t pass the source contents/value of the file in at all?

elmasse commented 3 years ago

So in rehype-raw@5 it worked, and in v5.1 it stopped?

It stopped in v5, I had to downgrade to v4 (which is what I was using before). Honestly I haven't tested v5 vs v5.1 since I did an update by npm ranges.

Do you mean the vfile you pass is empty? As in, you don’t pass the source contents/value of the file in at all?

As I stated in the test case, I use a plain string that I pass to processor.parse. I guess I could try to use to-vfile but in any case, since the parser generates the tree with position information, I don't quite understand why the raw will remove them.

wooorm commented 3 years ago

since the parser generates the tree with position information, I don't quite understand why the raw will remove them.

The problem is that, yes, the parser has the source file so it can add positional info, but this plugin also needs the source file. The solution thus is to also pass the source to run: https://github.com/unifiedjs/unified#processorrunsyncnode-file.

I don’t see how this ever worked on v4 though? The difference between 4 and 5 was added types 🤔

wooorm commented 3 years ago

could you confirm whether:

const doc = 'whatever'
process.runSync(process.parse(doc), doc)

Does the trick?

elmasse commented 3 years ago

Yes, that works!

Awesome, thanks!

Closing this issue.

github-actions[bot] commented 3 years ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

github-actions[bot] commented 3 years ago

Hi team! I don’t know what’s up as there’s no phase label. Please add one so I know where it’s at.

Thanks, — bb

wooorm commented 3 years ago

Great to hear! ✨