Closed remcohaszing closed 1 year ago
Definitely open to ideas on improving keying for faster re-renders. 👍 Question, wouldn't keying off n-th object be equivalent to index keying? Which is a known anti-pattern https://robinpokorny.com/blog/index-as-a-key-is-an-anti-pattern/ which itself can cause performance issues and missed content updates. Or is there something I missed that avoids those pitfalls?
semi-related to https://github.com/remarkjs/react-markdown/issues/466 and some of the discussion there may be relevant
Yes, using the index as a key is a well-known anti-pattern. The key should uniquely identify the data it is rendering inside the list. It’s about providing a unique stable identity in the array, not about making it as unique as possible. This allows React to prevent unmounting and mounting components causing unnecessary rerenders. Note that the current solution, and both alternative proposals all use the index. My proposal doesn’t literally use the array index, but it also relies on the array position.
The current situation is very close to this pitfall
Keys must not change or that defeats their purpose! Don’t generate them while rendering.
Using an index is an well known anti-pattern, because it’s simplist way to silence React warnings and most of the time there’s a a better solution, i.e. using an id. However, unist nodes don’t have ids. Unist nodes have two things that identify them: their type (or tag name for hast nodes) and their index. The positional information doesn’t make them more unique, because the index already always makes them different.
Let’s say we make the following modification to the example above:
# Title
This is a paragraph
+ which now spans multiple lines
![image](./image.png)
This is another paragraph
With the current algorithm, this means every React element generated from the markdown below the added line receives a new key. This means React will remove all that content from the DOM, and generate new HTML elements to add to the DOM.
With pure index-based keys or my proposal to use the nth occurrence of the tag name, React would only update the changed paragraph content in the DOM.
Where my proposal to use the nth occurrence differs from pure index-based keys, is when elements are swapped, deleted, or inserted.
# Title
+
+ ![image](./image.png)
This is a paragraph
-
- ![image](./image.png)
This is another paragraph
In this situation pure index-based keys would notice that the React element type of the 2nd and 3rd elements have changed, so it would delete those DOM nodes and create new ones. With my proposal, the keys for the image and paragraph would stay the same, meaning React will swap these existing nodes in the DOM.
# Title
This is a paragraph
+
+ This paragraph is inserted
![image](./image.png)
This is another paragraph
When inserting an element, only elements of the same type are affected. In this example, pure index-based keys would trigger rerenders of all elements below the inserted content, so image element would be replaced with a new paragraph. The old last paragraph would be replaced with a new image element, and a new paragraph is created.
<h1 key="0">Title</h1>
<p key="1">This is a paragraph</p>
- <img key="2" src="./image.png" /> {/* key 2 updated its tag name to `p` */}
- <p key="3">This is another paragraph</p> {/* key 2 updated its tag name to `img` */}
+ <p key="2">This paragraph is inserted<p>
+ <img key="3" src="./image.png" />
+ <p key="4">This is another paragraph<p> {/* key 4 is new */}
With my proposal, React would see this as a swap of what was the last paragraph
<h1 key="h1-0">Title</h1>
<p key="p-0">This is a paragraph</p>
- <img key="img-0" src="./image.png" /> {/* key img-0 was relocated, but not altered */}
- <p key="p-1">This is another paragraph</p> {/* key p-1 text contents changed */}
+ <p key="p-1">This paragraph is inserted<p>
+ <img key="img-0" src="./image.png" />
+ <p key="p-2">This is another paragraph<p> {/* key p-1 is new */}
But how far should this be taken? Such as, we could also hash content, e.g., you could generate ids based on the first 3 and last 3 words or so?
As far as I understand it, keys make sense for when you are generating elements with code, but not really as much when compiling markdown and similar like we’re doing?
I’m very open to improve the situation, but because what the proper solution is vague, I’d like to see some performance results?
@ChristianMurphy @wooorm
Hi. Thank you for your hard work and providing such a awesome library.
We have developed weseek/growi : an app that can edit markdown and show HTML preview side-by-side, and now working on to migrate to remark from markdown-it.
https://user-images.githubusercontent.com/1638767/202691059-b4ed67a8-5e15-4ad5-9ed3-4412d5467ad6.mp4
drawio
rendering feature is implemented by remark/regype plugin and DrawioViewer React component.
When the user increments lines, all keys in blocks below the cursor are changed and re-rendering (strictly speaking, the rendering of the new component) is processed force regardless the contents of the block is not changed.
The key is fixed by https://github.com/remarkjs/react-markdown/blob/a80dfdee2703d84ac2120d28b0e4998a5b417c85/lib/ast-to-react.js#L217-L222
and there is no way to overwrite that.
Fortunately, the position
variable used to determine the key is initialized by null if node.position
does not exist.
So you can work around this by deleting node.position
so that the start/end lines are not included in the key.
import { Node } from 'unist';
export const remarkPlugin: Plugin = function() {
return (tree) => {
visit(tree, (node, index) => {
if (node.type === 'code') {
if (isDrawioBlock(node.lang)) {
rewriteNode(node, index ?? 0); // method for drawio feature
// omit position to fix the key regardless of its position
delete node.position;
}
}
});
};
};
This will change key="drawio-8-1-6"
to key="drawio---6"
.
I'm not good at English, so I apologize if I'm rude.
But how far should this be taken? Such as, we could also hash content, e.g., you could generate ids based on the first 3 and last 3 words or so?
Exactly. But I think react-markdown need not to care that. As with comments like https://github.com/remarkjs/react-markdown/issues/466#issuecomment-812553724, most developers would like to manage "when the component should be re-rendered" in the component own.
Therefore, I think it makes sense that ast-to-react.js
waive the responsibility to fix the key or provide to the way to determine the key manually. (the default value is fine with the current implementation)
The first idea is implementing a new option like following that are suggested in #466:
const key = typeof options.keyGenerator === 'function ' ? options.keyGenerator(node, pos, index) : [node.type, pos.line, pos.column, index].join('-')
or
const key = node.key ?? [node.type, pos.line, pos.column, index].join('-')
node.properties
The second idea is just moving the code to call addProperty
in for-in iteration after the code to set properties.key
.
https://github.com/remarkjs/react-markdown/blob/a80dfdee2703d84ac2120d28b0e4998a5b417c85/lib/ast-to-react.js#L178-L184
Developers can overwrite properties.key
by node.properties
.
What do you think?
@yuki-takei Hey, thanks for your patience. Can you explain what your algorithm would look like? That’s the thing #466 stalled on: people have a problem X, they think this solution Y solves it, but after more thinking we find out that solution Y doesn’t solve problem X.
@remcohaszing What’s a bit more complex in your examples is that you use paragraphs vs images. However, those have different content types in markdown/mdast. Images are always in a paragraph (or heading). So these examples you show are all paragraphs. Which points to a following thing: most of the edits in a document will be to paragraphs. Most of the edits will be to text nodes.
It’s a bit of a hard problem to think about IMO. Perhaps some actual checking of how the editor responds to different keys might be a good idea.
@wooorm
IMO, there is no XY problem.
@dang1412 said:
This option will allow user to be able to choose when a React element should be re-rendered effectively.
I agree with him. The most important point in these issues is "Can the key be determined by the component implementer?", not about algorithmically based performance.
My opinion is:
Therefore, I think it makes sense that ast-to-react.js waive the responsibility to fix the key or provide to the way to determine the key manually. (the default value is fine with the current implementation)
on https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1319912219
He said:
this sounds like an XY problem.
potentially related https://github.com/remarkjs/react-markdown/issues/289 and https://github.com/remarkjs/react-markdown/issues/459
I don't think so.
The default ${name}-${line}-${column}-${index}
style keys will cause proper re-rendering of whatever the content is. But what is needed here is simply a way (option) to suppress the re-rendering by programmers, and is not related to parsing performance.
So only the simple "Y problem" exists from the beginning and there is no complex "X problem".
I'm cautious about being overly optimistic that stable identifiers would perform better than letting react rerender. The first approach that comes to mind that would work generally would be generating a merkle tree with a fast hash function (murmur, blake2, etc), which would bubble changes up. But it's questionable if hashing would be faster than just re-rendering.
I also don't think the custom key generator always provides better performance than the default ${name}-${line}-${column}-${index}
style key. However, with the current implementation, re-rendering always happens when a row changes, even if the programmer wants to ignore the re-rendering itself. That's the point.
He said the same thing with @dang1412 and me.
He explained his proposal as follows:
This avoids the issue of hashing everything unnecessarily, while providing a solution for actually slow components.
By allowing the keys of a component to be customized, you can give programmers a way to reduce re-rendering.
to render a custom node type, you need a custom renderer, correct? The custom renderer can set a key attribute hashing any combination of attributes you want.
This would be wrong. I can't make sense of what "custom renderer" means strictly, but there is no way to overwrite that by my investigation. Please see the code: https://github.com/remarkjs/react-markdown/blob/a80dfdee2703d84ac2120d28b0e4998a5b417c85/lib/ast-to-react.js#L217-L222
And proposal 2 on https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1319912219 make us to be able to overwrite it.
@remcohaszing reported:
As can be seen here, React keys are generated using the name, line, column, and node index. This means that:
- If the type is changed, it will be presented to React as a new node.
- If a user enters a newline above the node, it will be presented to React as a new node.
- If a user changes the content above the node to change the node count (i.e. splitting one paragraph into two), it will be presented to React as a new node.
The above is why I treat #703 and #466 as identical problems. He shows the benefits of at least line/column agnostic keys.
My workaround written in https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1319912219 stems from the same idea as his comment.
But how far should this be taken? Such as, we could also hash content, e.g., you could generate ids based on the first 3 and last 3 words or so?
It's hard to find a solution that generates hashes from content and performs well in all cases. Rather, I think the approach needed now is to provide a way for programmers to be able to set the key that they already know or claim to be the best to reduce re-rendering.
Thanks for all your responses.
Can you please explain what your algorithm would look like?
I think the approach needed now is to provide a way for programmers to be able to set the key that they already know or claim to be the best to reduce re-rendering.
What will you provide?
Index as key is a well known anti-pattern in React. Often datasets contain something that uniquely identifies a key, such as an id or username. Using such an identifier helps React to know what to rerender and how to track state.
However, this anti-pattern is so well-known people try to avoid key={index}
so badly, they come up with way worse alternatives. This also makes it hard to convince people that index as key can be ok. In reality, if no key is provided, React will use the index, because it’s their best guess given no context.
The pattern used by react-markdown
is an example of that. It doesn’t literally use key={index}
. Instead, it uses a value that is compound of a couple of values, including the index. This means it already uses index as key. Worse, it includes fields that are likely to change a lot, so they cause a lot of rerendering and lost state (if custom components are used).
A significant improvement would be in ast-to-react.js
:
- properties.key = [
- name,
- position.start.line,
- position.start.column,
- index
- ].join('-')
+ properties.key = index
If someone adds a line to a paragraph, that means the position.start.line
changes for all siblings after it. This means the entire React tree after that paragraph will be removed, then the new tree will be added. By using index as key, this update would only happen when inserting a new node.
We could stop here. This would already be a significant improvement. But we can go further.
Above uses index as a way to uniquely identify a unist child. However, there is another way to uniquely identify them. Given the following hast tree, we can identify the last element not just as “the fourth node”, but also as “The third element”, or “The second element whose tagName is p
”.
{
"type": "root",
"children": [
{ "type": "text" },
{ "type": "element", "tagName": "p" },
{ "type": "element", "tagName": "div" },
{ "type": "element", "tagName": "p" }
]
}
My idea is to use this last way of identification: “The second element whose tagName is p
”. Now, lets make a change to the tree by swapping the last two elements:
{
"type": "root",
"children": [
{ "type": "text" },
{ "type": "element", "tagName": "p" },
{ "type": "element", "tagName": "p" },
{ "type": "element", "tagName": "div" }
]
}
Now this identification still works. This means React would be able to identify this swapping of elements, and avoid deleting and creating DOM nodes and state of custom components.
In ast-to-react.js
, that would look a bit like this:
export function childrenToReact(context, node) {
/** @type {Array<ReactNode>} */
const children = []
let childIndex = -1
let tagCounter = Object.create(null)
while (++childIndex < node.children.length) {
const child = node.children[childIndex]
if (child.type === 'element') {
const tagName = child.tagName
if (tagName in tagCounter) {
tagCounter[tagName]++
} else {
tagCounter[tagName] = 0
}
const key = tagName + '-' + tagCounter[tagName]
children.push(toReact(context, child, key, node))
} else {
// …
}
}
return children
}
Re: hashing
The goal of a key in React is not to prevent React from reusing DOM nodes. It’s to aid React determine what changed. Using a hash based on content would be deliberately preventing React to update the DOM efficiently if one character changes. In addition the hashing itself on every render is resource intensive.
My idea is to use this last way of identification: “The second element whose tagName is p”
@remcohaszing Thanks for sharing the idea! I think your suggestion sounds really good.
In case we insert a p
(or some unique component) into the article, which likely usually happens, will not cause all other heavy custom React components to re-render. I was thinking about the hash but you are right, it is resource intensive and this implementation looks simple and very effective.
Edit: Wait! Sorry but what if we need it to re-render, adding text to the node will not cause the key to change, so react will think its content is unchanged and not update it.
Update: adding text to the node
case: The key isn't changed but the input changed so React still updates it. The difference is with different keys it drops and re-creates the whole component even with the same input. So I think this solution might work...Sorry for the mess up, let me take some time and check in detail later! ;)
@remcohaszing this seems like a reiteration of your earlier arguments. Still very good points. But I also still have unadressed questions on your earlier arguments: https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1381917891
One of the points made there was that this project is about markdown, so there will be a lot of p
elements. Few other things.
Additionally, on your new p
/div
/p
-> p
/p
/div
diff example, it could happen, but a) I think is more likely to happen as two changes: p
/div
/p
-> p
/p
-> p
/p
/div
, and b) your proposal is theoretically a large improvement assuming the change is in nodes that don’t occur frequently (the div
). Your proposal is theoretically a small improvement when the change is in a super common node (p
).
I think it would be very good to check these ideas in a browser to investigate their actual performance
Using a hash based on content would be deliberately preventing React to update the DOM efficiently if one character changes
This is the case for hashes that are meant to, indeed, produce different results if one character changes.
There are ways to create hashes that are resistent to change while being more unique than the {tagName}-{counter}
proposal.
Some pseudocode:
function hash(node, counts) {
const words = toString(node).split(/\W+/)
const headAndTail = [...words.slice(0, 3), ...words.slice(-3)]
const hash = headAndTail.map(d => d.charAt(0).toLowerCase()).join('')
const count = counts.get(hash)
if (count === undefined) {
count = 0
} else {
count++
}
counts.set(hash, count)
hash += '-' + count
return hash
}
You could go much further with this, but we want to keep it fast. See also Near-duplicates and shingling
In addition the hashing itself on every render is resource intensive.
Correct. As so often in programming, I think we have to decide between trade-offs here.
Do we want removing/adding common nodes to be much faster?
Do we want all renders to be a bit slower?
Do we have a fast path, for p-0..p-5
, at which point we switch to a hash based on the content?
To decide on these trade-offs, I think it’s good to check what the performance results are in practice.
What will you provide?
Here you are. https://www.npmjs.com/package/react-markdown-customkeyprop
react-markdown
with react-markdown-customkeyprop
key
property to rehype node
Both of the plugins I wrote below rewrite keys of code
elements. Please use whichever you prefer.
import type { Element } from 'hast-util-select';
import type { Plugin } from 'unified';
import { visit } from 'unist-util-visit';
export const remarkPlugin: Plugin = () => {
return (tree) => {
visit(tree, (node, index) => {
const { type, position } = node;
// filtering
if (type !== 'code') {
return;
}
const data = node.data ?? (node.data = {});
data.hProperties = {
key: `${type}-${position?.start.line}-${position?.start.column}-${index}`,
};
});
};
};
export const rehypePlugin: Plugin = () => {
return (tree) => {
visit(tree, 'element', (node: Element, index) => {
const { tagName, position } = node;
// filtering
if (tagName !== 'code') {
return;
}
const properties = node.properties ?? {};
properties.key = `${tagName}-${position?.start.line}-${position?.start.column}-${index}`;
node.properties = properties;
});
};
};
import * as customKeygenerator from './custom-keygenerator';
...
<ReactMarkdown children={markdown} remarkPlugins={[customKeygenerator.remarkPlugin]} />
Can you please explain what your algorithm would look like?
Sure! It is the "Overwrite algorithm". Oh, have you come up with a smarter algorithm? Don't worry, you can use that too.
You show the existing algorithm to generate keys. Previously, a) you expressed you wanted to use your own keys, b) you had an algorithm in mind to calculate keys. I asked about b). Can you please post what your algorithm would look like.
My proposal consists of two steps of which I am certain they will decrease rerenders
Re paragraphs vs images: It’s true images are wrapped by paragraps. This example was poorly chosen. They could still be code blocks, block quotes, or transformed content though. While it’s true there are a lot of p
elements, they’re not necessarily all the same.
Re hashing algorithm: I now better understand the hashing algorithm provided. It’s not more unique than stages 1 or 2. However, it might be a better identifier. Perhaps it would be slightly simpler to just use the first 6 characters. I imagine a user is more likely to change the last characters thant the first, but that’s a detail and a hunch.
I honestly don’t know if this hashing would help. I imagine this could be optimal in specific situations, but not others. I.e. it seems more optimal when moving paragraphs around, but less so when editing the first or last character of a paragraph.
I am confident that all of the proposed formulas are better than the current approach, mostly because none of them contain the positional information.
I was hesitant about a custom key function, but maybe it’s actually not that bad. We could even export two functions that implement such a key generator function. Although I’m not sure what its API should look like.
While it’s true there are a lot of p elements, they’re not necessarily all the same.
That’s my point. Adding or removing a paragraph means all paragraphs after it get a new index. Because paragraphs are so common, that is likely significant.
I honestly don’t know if this hashing would help. I imagine this could be optimal in specific situations, but not others. I.e. it seems more optimal when moving paragraphs around, but less so when editing the first or last character of a paragraph.
It should be, in your last example, because all other nodes will remain the same. Only one node has a different key.
I am confident that all of the proposed formulas are better than the current approach, mostly because none of them contain the positional information.
Same, presumably they are. That’s why I am very interested in actual performance metrics.
I was hesitant about a custom key function, but maybe it’s actually not that bad. We could even export two functions that implement such a key generator function. Although I’m not sure what its API should look like.
Can you provide reasons for why you changed your mind? Why would humans be better off writing custom key generators? (And if they do, what would they use?) How would that be better than a good algorithm that is baked in?
TLDR: props.key = index
is great.
Alright, I tested this out.
There is no noticable improvement with any of these methods. It does become significantly slower when trying my crude hash idea. Of course, that could be implemented smarter (hashing every node of course is slow, the benefit would likely only be in top-level nodes). But one algorithm being slow, and the rest being fast, shows that it all doesn’t really matter.
In the demo (https://github.com/remarkjs/react-markdown/blob/6b25365b9079178de077268a42afcd16edf22f87/src/index.js#L156), I added:
@@ -141,6 +141,8 @@ const root = ReactDom.createRoot(main)
/** @type {Awaited<ReturnType<createStarryNight>>} */
let starryNight
+let before = performance.timeOrigin + performance.now()
+
// eslint-disable-next-line unicorn/prefer-top-level-await -- XO is wrong.
createStarryNight(grammars).then((x) => {
starryNight = x
@@ -154,6 +156,10 @@ createStarryNight(grammars).then((x) => {
})
function Playground() {
+ const after = performance.timeOrigin + performance.now()
+ console.log(after - before)
+ before = after
+
const [text, setText] = React.useState(sample)
const [gfm, setGfm] = React.useState(false)
const [raw, setRaw] = React.useState(false)
This is a crude way to check how fast renders actually are, but it works well to check how an actual editor would rerender
a
a bunch, sometimes pressing enter in between, which contineously rerenders as fast as possibleThe algorithms I checked:
[name, position.start.line, position.start.column, index].join('-')
)index
)child.tagName + '-' + count
)a), b), and c) all perform at about ±100ms between rerenders at the top, ±90ms at the bottom. a) is ±20ms slower at the top when adding enters, because everything after it gets a different key d) performs at ±200ms.
Then I tested b) and c) some more by keeping CMD+V pressed to add short new paragraphs as fast as possible at the top and the bottom. b) performs better, it’s at about 150ms at the top and 100ms at the bottom, whereas c) is 200ms and 150ms. This test makes the use of keys more visible compared to the previous. My assumption is that b) is a bit slower because it allocates objects for each parent node.
I guess the thing is:
So, I propose going with props.key = index
.
If someone wants to test b) and c) some more, be my guest, I would be happy to see more tests.
I attempted a bit of profiling using the React devtools, but didn’t find very noteworthy results. I expected more significant results. It’s not something I use a lot, so maybe I did it wrong, or maybe my test case was too simple.
I used the following code:
import { useEffect, useState } from "react"
import { createRoot } from "react-dom/client"
import { ReactMarkdown } from "react-markdown/lib/react-markdown.js"
const components = {
p({ children, ...props }) {
const [count, setCount] = useState(0)
useEffect(() => {
setInterval(() => {
setCount((old) => old + 1)
}, 500)
}, [])
return (
<p {...props}>
{children}
{count}
</p>
)
},
}
function App() {
const [markdown, setMarkdown] = useState("")
useEffect(() => {
let count = 0
setInterval(() => {
setMarkdown((old) => {
count += 1
return (count % 2 ? "# asd\n\n" : "zxc\n\n") + old
})
}, 1000)
}, [])
return <ReactMarkdown components={components}>{markdown}</ReactMarkdown>
}
createRoot(document.getElementById("app")).render(<App />)
What is noteworthy is that option c) leads to fewer DOM updates and reused state of custom components compared to b). I just don’t really see this show up in the React profiler.
Still, I’m 100% for just using props.key = index
, at least until someone comes with a real need or proof for a better identity detection.
In our case (https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1319912219), props.key = index
is enough.
However, if it is fixed this time, I think props.key = props.key ?? index
is better.
It does not cause any disadvantages.
I really want to understand what you want to do with that ability. See https://github.com/remarkjs/react-markdown/issues/703#issuecomment-1502176879 for example. A feature without a use case is not better.
Hi! This was closed. Team: If this was fixed, please add phase/solved
. Otherwise, please add one of the no/*
labels.
I really want to understand what you want to do with that ability.
With custom keys, users could have opted for props.key = index
without waiting months.
Or maybe someone who wants to use a hash algorithm can implement it themselves.
Anyway, I'm glad that line/column agnostic keys were adopted. Good job.
That is not a use case or a problem. That is a solution to something that isn’t a problem. I want to hear your use case. Why do you want to do that
Hmm.. you really don't know?
What if some users say, "I want to make keys with hashes, that gives the best performance"? Are you going to discuss it again for a few months?
It has been 2 years. Perhaps you can answer the question instead of dodging it?
I think the idea here is that we shouldn't do premature optimisations. We shouldn't ship a feature because there may or may not be a use case in the future. Besides that, in my opinion there are certain things you don't want to expose to users, and key algorithms to improve re-renders definitely fits that category. We should leave that to knowledgeable people who are willing to proof it's better and let users have it for free as an internal change.
I think we can stop the discussion here for now. Thanks for all the detailed answers.
Perhaps you can answer the question instead of dodging it?
OK, I'll try.
That is not a use case or a problem. That is a solution to something that isn’t a problem. I want to hear your use case.
You need not to know. Because no key algorithm is perfect for all use cases and problems.
What I really wanted was line/column agnostic keys. So I'm satisfied with props.key = index
now. That was my use case. But it may not be enough for someone else, you know.
props.key = props.key ?? [anything]
should have saved you and your users (and past me) from wasting time. Since #466 was posted, people had been interested in two things: algorithmic exploration and first-aid options. I think there would have been no disadvantage even if implemented the options ASAP and then carefully examined the optimization methods.
Why do you want to do that
Because I would like ASAP.
Sorry, I went too far about "wasting time".
OK, let's finish. Thanks for the great changes.
Initial checklist
Problem
As can be seen here, React keys are generated using the name, line, column, and node index. This means that:
This means the key is explicitly unoptimized for React re-rendering.
Solution
The key should be calculated as the nth instance if that tag name.
I.e. the following markdown:
should be represented as:
This way if a paragraph changes, only that paragraph is changed. If another paragraph is inserted or removed, only all paragraphs that come after the edited paragraph are rerendered.
Alternatives