justinwilaby / sax-wasm

The first streamable, fixed memory XML, HTML, and JSX parser for WebAssembly.
MIT License
168 stars 8 forks source link

html containing `<?>` stops further parsing #52

Closed daKmoR closed 2 years ago

daKmoR commented 2 years ago

Describe the bug

Any html that contains <?> stops the parsing. This is a "trick" to shorten <!--?--> to safe bytes it's specified here https://html.spec.whatwg.org/#parse-error-unexpected-question-mark-instead-of-tag-name

<?> is in the output of lit ssr... here is the original issue

To Reproduce parse the following html

<!--lit-part cI7PGs8mxHY=-->
  <p><!--lit-part-->hello<!--/lit-part--></p>
  <!--lit-part BRUAAAUVAAA=--><?><!--/lit-part-->
  <!--lit-part--><!--/lit-part-->
  <p>more</p>
<!--/lit-part-->

Expected behavior

p
p

actual behavior

p

Additional context

full code to reproduce

import saxWasm from 'sax-wasm';
import { createRequire } from 'module';
import { readFile } from 'fs/promises';

export const { SaxEventType, SAXParser } = saxWasm;

const require = createRequire(import.meta.url);

export const streamOptions = { highWaterMark: 128 * 1024 };
const saxPath = require.resolve('sax-wasm/lib/sax-wasm.wasm');
const saxWasmBuffer = await readFile(saxPath);
export const parser = new SAXParser(SaxEventType.CloseTag, streamOptions);

await parser.prepareWasm(saxWasmBuffer);

parser.eventHandler = (ev, data) => {
  if (ev === SaxEventType.CloseTag) {
    console.log(data.name);
  }
};
parser.write(Buffer.from(`
<!--lit-part cI7PGs8mxHY=-->
  <p><!--lit-part-->hello<!--/lit-part--></p>
  <!--lit-part BRUAAAUVAAA=--><?><!--/lit-part-->
  <!--lit-part--><!--/lit-part-->
  <p>more</p>
<!--/lit-part-->
`));
parser.end();
justinwilaby commented 2 years ago

Great catch! Luckily this was an easy fix. Please have a look at #53 , if it looks good, I'll merge and publish.

Cheers!

daKmoR commented 2 years ago

sweet - parsing continues is imho all that is needed 🤗

so yes, that looks great 🤗