Closed fabiospampinato closed 3 years ago
You can use dangerouslySetInnerHtmk
for this case as that won't be 'wasteful' and will just insert the html.
I can't quite do that without paying for parsing twice, as I need to parse the HTML on my own in any case in order to sanitize it first.
Personally I ended up inserting and removing those DOM nodes manually, it's fairly easy to do efficiently, and it bypasses Preact entirely essentially.
I think having native support for returning DOM nodes might be an interesting capability though, and it would make this easier/quicker to do for other people.
I'm keeping the issue open to keep the discussion going, but feel free to just close it if it seems unlikely that this is going to happen 👍
Well managing dom-nodes somewhere outside (p)react breaks all sort of features like server side rendering or hydration. Also there are a load of libraries that support rendering markdown into components at various levels. e.g. mdxjs at build time or react-markdown at runtime. I've used both of them successfully with preact in past.
But if you need to manage nodes yourself, preact gives you all the tools to do so using refs and disabling component updates.
A short example would be: https://codesandbox.io/s/friendly-dream-38jhd?file=/src/index.js
Well managing dom-nodes somewhere outside (p)react breaks all sort of features like server side rendering or hydration.
I don't need those, if the internet imploded my app would work just the same basically. I suppose nodes could be serialized somewhat though, and hydrated back on the client.
Also there are a load of libraries that support rendering markdown into components at various levels. e.g. mdxjs at build time or react-markdown at runtime. I've used both of them successfully with preact in past.
The problem isn't doing it, is doing it with optimal performance. Performance-wise bypassing Preact/React is the least of the problems those ready-made runtime-rendering solutions have.
But if you need to manage nodes yourself, preact gives you all the tools to do so using refs and disabling component updates.
Yes, it's quite simple in the end. Perhaps it should be made simpler though, I'm not sure.
I don't need those, if the internet imploded my app would work just the same basically.
Good for you! The problem I see is supporting it on a library level, while breaking features other library users depend on. But I'm not a maintainer of preact, just a user. So the decision is up to them ;)
Good for you! The problem I see is supporting it on a library level, while breaking features other library users depend on. But I'm not a maintainer of preact, just a user. So the decision is up to them ;)
Sure sure, if I had to guess I'd say this is largely backwards compatible, as no components that currently work are returning a DOM node, plus one really has to put some effort in order to break their apps because of this feature when considering servers, like there's no DOM available server-side already, it'll just throw immediately.
@fabiospampinato you can use this component to render raw DOM nodes:
function DomNode({ children }) {
this.shouldComponentUpdate = () => false;
return Object.defineProperty(h(children.localName), '__e', { get: () => children, set: Object });
}
// usage:
function Markdown({ md }) {
const dom = markdown_to_dom(md);
return <DomNode>{dom}</DomNode>
}
That being said, you might be able to construct a pipeline for which Preact is actually beneficial as the renderer, rather than being overhead. If you parse the markdown-generated HTML using DOMParser rather than innerHTML it could be sanitized separately from rendering, which should be much faster. Then converting that to VDOM means Preact can diff when updating instead of destroying and recreating the whole DOM tree on each keystroke. To take things a step further, if you're able to assume the output of markdown-to-html conversion is well-formed, you could use a simple JavaScript HTML parser to construct VNodes directly:
// Parse a well-formed subset of HTML directly to VDOM
import { h } from 'preact';
const voidElements = {area:true,base:true,br:true,col:true,embed:true,hr:true,img:true,input:true,link:true,meta:true,param:true,source:true,track:true,wbr:true};
const tokenizer = /(?:<([a-z0-9]+)( \w+=(['"])[^>'"\n]*\3)*\s*(\/?)\s*>|<\/([a-z0-9]+)>|([^&<>]+))/gi;
const attrTokenizer = / (\w+)=(['"])([^>'"\n]*)\2/g;
function parseHtml(html) {
let root = h('div', { children: [] });
let stack = [];
let parent = root;
let token, t, node;
tokenizer.lastIndex = 0;
while (token = tokenizer.exec(html)) {
if (token[1]) {
let props = { children: [] };
attrTokenizer.lastIndex = 0;
while (t = attrTokenizer.exec(token[2])) props[t[1]] = t[3];
node = h(token[1], props);
parent.props.children.push(node);
if (!voidElements[token[1]]) {
stack.push(parent);
parent = node;
}
}
else if (token[5]) parent = stack.pop() || root;
else parent.props.children.push(token[6]);
}
return root;
}
Here's a full demo of the above: https://jsfiddle.net/developit/1djq4z7r/
@developit Lots of interesting things to comment on!
First of all for some added context I ended up doing the following:
@fabiospampinato you can use this component to render raw DOM nodes:
That component looks pretty interesting to me, because compared to what I'm currently using it doesn't require having a ref to the parent node, which is nice.
How is it diffed against the previous render though, are the previous nodes mutated at all? Do all the old nodes get detached and all the new ones attached? Are only the old/new nodes that changed detached/attached?
If you parse the markdown-generated HTML using DOMParser rather than innerHTML it could be sanitized separately from rendering, which should be much faster.
I'm not sure what you mean by this, I'm using DOMPurify for sanitization, which is the only tool that I mostly trust to do the job, and parsing+sanitizing before rendering is the only thing you can do safely, right? I'd really like to move sanitization to a worker, but DOMPurify doesn't support that.
Then converting that to VDOM means Preact can diff when updating instead of destroying and recreating the whole DOM tree on each keystroke.
True, but I'm already using DOMParser only on the portion of the HTML that changed and caching all the other nodes, and I'm already doing some rough diffing with the function I posted above. The diffing is a bit crude but: 1) the DOM tree corresponding to each markdown string is almost always fairly shallow, most of the times I only need to detach something like a paragraph and construct and attach a new one, which should be very fast already. 2) I guess Preact would do a recursive diff, but this way the old nodes would be mutated, so I could no longer keep around a cache mapping html strings to DOM nodes, as those may get edited by Preact, I guess.
To take things a step further, if you're able to assume the output of markdown-to-html conversion is well-formed, you could use a simple JavaScript HTML parser to construct VNodes directly:
Unfortunately I very much can't assume that the produced HTML is well-formed (I think in order for a Markdown compiler to be CommonMark-compliant it mustn't produce well-formed HTML under every scenario, like I think for example some malformed HTML should just pass through unchanged, plus I have plugins that can edit the HTML the compiler outputs however they like), but also I'm not sure how resilient that would be against XSS, like I think that would just pass through stuff like the "onerror" attribute here: <img onerror="alert(123)">
, or maybe not, I don't know there are so many XSS shenanigans that I wouldn't feel confident enough doing that with arbitrary user input, but in any case importantly there's a browser-provided sanitization API coming soon (https://wicg.github.io/sanitizer-api/) that I would want to make sure to use as it should be more reliable and hopefully faster.
Performance-wise more in general, in case you are interested in reading this, for my use case I'm more or less doing the following, considering the first render and subsequent renders when the markdown gets edited:
Then if the output is relatively small:
If instead the output is pretty huge:
To improve this further I think I could potentially explore the following options:
Anyway I'm always open to ideas on how to speed things up 😁
@fabiospampinato Here's an implementation of Markdown processing and HTML sanitization in a Worker. It uses DOMPurify in the Worker by instantiating it against Linkedom: https://jsfiddle.net/developit/fjdh23pc/
As for the windowing/diffing bit, this could probably be done more simply in Preact because it could use referential equality rather than a bidirectional search algorithm.
Iframes won't help performance sadly, since they're only out-of-process when the iframe is loading content from another origin.
@fabiospampinato Here's an implementation of Markdown processing and HTML sanitization in a Worker. It uses DOMPurify in the Worker by instantiating it against Linkedom:
Oh wow, that's amazing!
On one hand I'm allergic to adding ~200kb+ of dependencies, on the other hand this is awesome, with something like that I could potentially sanitize stuff in parallel too. Other drawbacks are that I most probably would still want to use the new Sanitizer API, which just isn't available in a worker, and while parsing and sanitization is offloaded to the worker some nodes would still need to be created on the main thread, but I guess in this scenario it'd be better to just output VDOM and let Preact do its magic. I wonder if LinkeDOM not being 100% spec compliant could cause issues in some edge cases, maybe not, I'm not sure which parts of the DOM DOMPurify actually relies on.
Iframes won't help performance sadly, since they're only out-of-process when the iframe is loading content from another origin.
Oh really? 😢 Maybe that'll change in the future, maybe it's just a limitation of the browser that will eventually be lifted.
I don't think DOMPurify relies on anything linkedom lacks, but I haven't read the source in its entirety.
Here's a version of the original "using Preact to apply markdown updates via diff" demo that adds subtree caching. In the demo, subtrees Preact touches (even if unchanged) are indicated by a purple flash. When typing, you can see how little of the DOM is actually being traversed: https://jsfiddle.net/developit/8of0p9cn/
https://user-images.githubusercontent.com/105127/137214681-d10e7a48-32f2-4d79-a52a-e6133a76440d.mov
One advantage of this approach is that the "skipping over unchanged blocks" technique is applied to the entire document, rather than only its root elements. When editing a long paragraph or the contents of a table, this means only the modified text / table cell is traversed and diffed rather than the entire containing paragraph/table.
Pretty amazing how far very little code can get you.
To summarize:
<script>document.body.innerHTML = 'pwned!';</script>
. It might be still worth exploring the iframe idea even if just as an added layer of isolation.I'm closing this as the component mentioned here seems to basically address the issue already, and mutating the DOM manually with a ref to the parent is quite doable too.
Little performance update.
I'm playing with editing the following document, by appending a character at line 3:
I'm getting the following with the the latest preact-based fiddle provided (with the flash plugin disabled though):
Basically Preact takes ~60ms to work its magic, which seems fairly high to me if subtrees are cached properly, maybe there could be a bug in the caching logic. But also there's a huge chunk of style recalculation and layout happening, I'm not sure if that's caused by Preact itself or something else though.
This is what I'm getting with the rough diff function I was using before:
Basically the diffing itself in this specific use case doesn't take much time at all (most of those 2ms are actually spent doing other things), but there's still a huge style recalculation chunk. I'm seeing the same result independently of using removeChild+insertBefore or replaceChild (which I didn't know existed and I was hoping was going to be more optimized).
This is what I'm getting instead if top-level nodes aren't entirely swapped in and out but they are patched:
There's basically nothing happening really, the chunk on the left is about compiling the markdown, the diffing itself takes almost no time, and almost everything else is due to unrelated app-level things.
This is the updated diff function I'm using:
To summarize DOM-based recursive diffing seems much much faster both when considering the diffing itself and also everything that happens later in the rendering pipeline of the browser. I'm not sure if there are some bugs in the fiddle that once fixed could make it just as fast, and it seems unlikely to me that it could be made the fastest since almost nothing is happening with DOM-based recursive diffing already.
It might be worth noting that nanomorph in particular seems something between broken and super slow in the general case though. I should find something better.
@developit I think I've spotted some issues with the component for returning raw dom nodes:
function DomNode({ children }) {
this.shouldComponentUpdate = () => false;
return Object.defineProperty(h(children.localName), '__e', { get: () => children, set: Object });
}
null
or span
as the node type, but then I get some quirks during development, multiple nodes get injected in the page with each HMR event basically. Maybe I should detach the child node manually when the component is unmounted?https://user-images.githubusercontent.com/1812093/139543595-e57532fc-7657-4b35-b6bb-5e5961900e35.mov
Update: using null
+ detaching the node manually when the component unmounts seems to work, I can't spot any issues with it.
@fabiospampinato that makes sense - Preact uses null
internally for text nodes as the vnode type.
function DomNode({ children }) { this.shouldComponentUpdate = () => false; return Object.defineProperty(h(children.localName), '__e', { get: () => children, set: Object }); }
Awesome!! 🎉 Just one question: is it safe to use this approach long term? Can I trust this implementation or will this change in minor releases? I know it's working so far, but it doesn't hurt to ask 😬 (I don't know what that "__e" means... It seems to me it's something internal)
or will this change in minor releases?
It's stable.
I don't know what that "__e" means... It seems to me it's something internal
It's the __dom
property, which is set to be consistently mangled to __e
. This is done specifically to facilitate this sort of thing.
or will this change in minor releases?
It's stable.
I don't know what that "__e" means... It seems to me it's something internal
It's the
__dom
property, which is set to be consistently mangled to__e
. This is done specifically to facilitate this sort of thing.
Thanks a lot!
Describe the feature you'd love to see
Some context: I work on a Markdown-based notes app, I'm trying to make it as fast as possible but I've found the inability to return raw DOM nodes from Preact components to be a major performance issue for me.
Ideally I'd like the app to take the following steps:
Essentially every step in the list is necessary, at least for parts of the input string.
The problem is with Preact I can't quite just do those steps easily, because I can't just return DOM nodes from components at step 4. and let Preact manage the swapping in and out for me.
So instead of a simple step 4. I need to either:
Maybe there are some fundamental issues with allowing this, but it seems like the cleanest solution to me. Maybe managing this manually is actually quite easy and quick and it's better to ask users to do that to squeeze some extra performance if their use case allows for it. I'm not sure what the best path forward is on this, both for myself and Preact.
Additional context (optional)
Kind of off-topic but Solid supports returning raw DOM nodes from components.