inikulin / parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
MIT License
3.67k stars 236 forks source link

Rewriter with Promises #273

Open vptcnt opened 6 years ago

vptcnt commented 6 years ago

Hello

I can't use the HTML rewriting with a Promise I do have an error:

Error [ERR_STREAM_PUSH_AFTER_EOF]: stream.push() after EOF

This is my code :

const RewritingStream = require('parse5-html-rewriting-stream');
const rewriter = new RewritingStream();
const parse5 = require('parse5');

// Parses HTML text 
const document = parse5.parseFragment(textHtml);

// Rewrites all text nodes
rewriter.on('text', (_, raw) => {
  myfunctionPromise(raw).then(res => {
    rewriter.emitRaw(res);
  }).catch(err => {
  console.error(err);
 });
});

with the following code, it's working :

rewriter.on('text', (_, raw) => {
  rewriter.emitRaw(res);
});

Could you please help me

Vincent

RReverser commented 6 years ago

I actually had a very similar issue recently and worked around it in custom subclass of RewritingStream. If @inikulin doesn't mind, I can upstream my changes.

RReverser commented 6 years ago

For what it's worth, what I did to work around this is subscribed to all the events, but added rewriters to a Promise-based task queue so that ordering of the output would be still preserved. Pseudo-code (without error handling etc.):

let queue = Promise.resolve();

function defer(fn) {
  // adds function to the queue making sure that it runs only after all the previous have completed
  queue = queue.then(fn);
}

rewriter.on('text', (_, raw) => defer(async () => {
  let res = await myfunctionPromise(raw);
  rewriter.emitRaw(res);
}));

for (let eventName of ['startTag', 'endTag', 'doctype', 'comment']) {
  // each event has to be subscribed to and deferred individually or it
  // will fall back to emitting as-is synchronously
  rewriter.on(eventName, (_, raw) => defer(() => {
    rewriter.emitRaw(raw);
  }));
}

rewriter.write(html, async () => {
  // don't close the stream until all the tasks have finished
  await queue;

  rewriter.end();
});
inikulin commented 6 years ago

@RReverser I wonder how idiomatic it is to have async handlers for stream events? If that's a common a practice then I'm all for including support for such a functionality into package.

RReverser commented 6 years ago

It's not common at all (unfortunately), although there have been some discussions.

inikulin commented 6 years ago

@RReverser Maybe it's worth adding your code as receipt in the docs then?

RReverser commented 6 years ago

It's kinda ugly due to need to iterate all events not subscribed yet manually. I wonder if it would be better to add unhandledToken event as a fallback instead of assuming that user wants emitRaw for easier control.

inikulin commented 5 years ago

As it's not idiomatic I would rather avoid adding support for that to the API. However, I'll be happy to add @RReverser's recipe to the docs.