justinwilaby / sax-wasm

The first streamable, fixed memory XML, HTML, and JSX parser for WebAssembly.
MIT License
168 stars 8 forks source link

First character of comments is removed #68

Closed samuelcolvin closed 1 year ago

samuelcolvin commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

When parsing xml/html comments, the first comment is skipped

To Reproduce

Run the following in the browser with the lib directory of master copied or sim-linked to the same directory.

<h1>sax-wasm demo</h1>
<pre id="output"></pre>

<script type="module">
  import { SaxEventType, SAXParser } from './lib/module/index.js';
  window.SaxEventType = SaxEventType;

  async function loadAndPrepareWasm() {
    const saxWasmResponse = await fetch('./lib/sax-wasm.wasm');
    const saxWasmbuffer = await saxWasmResponse.arrayBuffer();
    const parser = new SAXParser(SaxEventType.Attribute | SaxEventType.OpenTag | SaxEventType.CloseTag | SaxEventType.Comment);

    // Instantiate and prepare the wasm for parsing
    const ready = await parser.prepareWasm(new Uint8Array(saxWasmbuffer));
    if (ready) {
      return parser;
    }
  }

  loadAndPrepareWasm().then(main);
  const el = document.getElementById('output');

  function main(parser) {
    console.log('Wasm is ready to parse', parser);
    parser.eventHandler = (event, data) => {
      console.log('evnet data JSON:', data.toJSON());
      el.innerHTML += `${data.constructor.name}: ${JSON.stringify(data.toJSON())}\n`;
    }

    // const xml = '<div class="foobar" z={1}>hello<!--comment--></div><br/>'
    const xml = '<!--comment-->'
    const enc = new TextEncoder();

    parser.write(enc.encode(xml));
    parser.end();
  }
</script>

The output is omment.

Full output:

Text: {"start":{"line":0,"character":1},"end":{"line":0,"character":12},"value":"omment"}

Expected behavior A clear and concise description of what you expected to happen.

AFAIK comments should start from the first character after <!--, just as comment currently include the last character before -->.

Desktop (please complete the following information):

But I also got this running the rust code with rustc 1.65.0 (897e37553 2022-11-02) directly.

justinwilaby commented 1 year ago

Confirmed.

Please see #69

Thank you for helping make this library better!

samuelcolvin commented 1 year ago

Awesome, thanks so much.

justinwilaby commented 1 year ago

published v2.2.2 with this fix just now.