rehypejs / rehype-react

plugin to transform to preact, react, vue, etc
https://unifiedjs.com
MIT License
390 stars 27 forks source link

Memoize Components Prop #38

Closed stevemk14ebr closed 2 years ago

stevemk14ebr commented 2 years ago

Initial checklist

Problem

I am using rehype-react as a stringifier to create react components from remark nodes. I have a special node type that I look for, and using the components map, I replace this node type with a custom react component. I found that it was actually fairly hard to use this API correctly to prevent unnecessary re-renders. I'm opening this issue to discuss my solution and if the API could be easier somehow (I don't have a suggestion in mind).

function MainAsyncMarkdownContent(props) {
    const dispatch = useDispatch();
    const { inline, onClick, components, ...restProps } = props;
    const editorState = useSelector(selectEditorState);
    const [transformed, setTransformed] = React.useState(null);

    const stringifier = Remark().use([[RehypeReact, { createElement: React.createElement, Fragment: React.Fragment, components: components }]]).freeze();

    const htmlmarkdown = transformed ? stringifier.stringify(transformed) : null;
   return (<span style={{ margin: "0px", textDecoration: "none", whiteSpace: 'pre-wrap' }} className={'markdown'} >{htmlmarkdown}</span>);
}

const MemoMainAsyncMarkdownContent = React.memo((props) => {
    return (
        <MainAsyncMarkdownContent components={props.components} />
    )
});

To pass the 'components' prop properly in a way that doesn't cause constant re-renders I had to use some rather complex React memoization features. This took me a long time to get right.:

const MemoRenderedInstance = React.memo((props) => {
    return (
        <RenderedInstance {...props} /> // not important,, custom component for me
    )
});

function useComponentMap() {
    const renderMap = useSelector(selectComponentRenderMap);

    // only re-create this function if the RenderMap changes, i.e. if any content of any RenderedInstance changes.
    const asterFn = React.useCallback(
        // transform an aster node type into a react element
        ({ tag, value }) => {
            if (!value) {
                value = "";
            }

            if (!tag) {
                tag = "";
            }

            switch (tag) {
                case 'aster':  // I insert special nodes into the node tree. This is from a markdown directive.
                    const [type, id, renderId, ...restsplit] = value.split(".");
                    if (!type || !id || !renderId || restsplit.length > 0) {
                        break;
                    }

                    // boom a directive turns into a react element
                    return (
                        <MemoRenderedInstance type={type} id={id} renderId={renderId} renderMap={renderMap} />
                    );
            }
            return (<span>{value}</span>);
        }, [renderMap]);

    // only re-create this map if the function changes
    return React.useMemo(() => { return { aster: asterFn } }, [asterFn]);
}

Markdown Editor

function Editor(props) {
    const componentMap = useComponentMap();
   /// things
   return (<MemoMainAsyncMarkdownContent components={componentMap} />)
}

I found that I had to memoize both my map, my function in my map (to prevent the map from being re-made), and also the component that I returned from my render function in the map. This took lots of debugging and research. I'd like to discuss if it's possible the library would be able to internally memoize some of these things to make this interface easier? I am honestly not sure if that's possible or not. Perhaps at least, an example of such a complex scenario could be documented for users?

Without doing this memoization, what happens is that anytime the editorState inside of MemoMainAsyncMarkdownContent changes, the entire markdown document gets re-parsed. Each component that is returned from the function in the component map then gets destroyed and re-created on every single update press even if their content didn't change at all. This is because the library internally sees the custom aster node, calls the function which emits a new react components, and then the react diff sees the new component destroys the old and re-mounts the new one. Every single component therefore becomes a 'use once then throw away' components. Memoization like i've shown fixes this so that un-changed MemoRenderedInstance stay mounted, but it's tricky to get right.

Solution

I am not sure honestly, discuss?

Alternatives

Document the memoization as I've shown.

stevemk14ebr commented 2 years ago

During my research I came across this https://usehooks.com/ , see the useMemoCompare. I think this might be one possible solution. And it also seems to explain this problem generally, I think.

Murderlon commented 2 years ago

Any reason why you didn't went for https://mdxjs.com if you want React components inside markdown? Seems like it would solve the complexity in the first place.

wooorm commented 2 years ago

I don’t think this project can do anything about it.

This is a react thing. You can encapsulate your whole unified/remark/rehype pipeline into memo and hooks stuff instead?

stevemk14ebr commented 2 years ago

@Murderlon I chose to do my own things because JS inside markdown wasn't something I wanted. Security in my situation was a huge factor as well, the things that I am writing in markdown are from malware reports, so trusting input was a nonstarter. Instead of mdx I use markdown directives for placeholders for react-elements. I also supported complex nested objects in my design, which involves nesting the pipelines. For example :directive [18181] would be a react element holding a table, and that tables itself would have many other directives inside of it. Controlling my pipeline lets me do all this efficiently. I offload parsing and stuff to a web worker pool so I can handle thousands of nested elements efficiently. It's very niche, I also have custom stringifiers to convert to different syntaxs like Jira.

@wooorm I think the solution I've shown is the best way to do it as is. I was wondering if there's a better choice for perhaps the createElement prop. Right now react.createElement is used, I'm wondering if some kind of memoized create prop would be better?

Anyway, feel free to close if you aren't interested in this. I just opened half to document this as a challenge and help anyone else in the future who deals with this.

wooorm commented 2 years ago

Can you elaborate on what is going wrong with createElement currently, and how you’d want to solve it?

I wonder how much this project should be agnostic to React/Preact/Vue/whatever being passed. E.g., it can currently do without Fragment. But with memo’s, it’d be pretty tied to React itself?

Regardless, I find memo-stuff confusing. But I guess the solution always boils down to: don’t create objects every time. And can be solved like so: https://github.com/mdx-js/mdx/pull/1924/files. So maybe an example in the readme here instead will help folks prevent such things?

stevemk14ebr commented 2 years ago

Sure, Imagine the standard use case of a left hand side text editor for markdown, and a right hand side markdown preview. Each and every keypress in the editor, the preview should re-parse the markdown. As it currently works, any React element returned from the function passed in components will be a new element as created by createElement. React will destroy the old component, and replace it with the new component even if the element could be re-used.

Specifically for me what happens is this. My markdown contains markdown directives, these are keys into a global map, each value in this map is rendered by a RenderedInstance as I show above (the values are markdown, i.e. I'm nesting markdown pipelines). When parsing occurs, the components prop is invoked, a new RenderedInstance created, the old destroyed, and the nested parser inside that RenderedInstance runs again. I have hundreds of markdown directives, so each and every keypress hundreds of pipelines run, causing significant jank.

How this should work is that my RenderedInstances only get unmounted and re-created if their props and state change, only re-running parsing when their value from the global map changes. To solve this I used complex memoization features like I showed in my original post. Re-create the function + components map when my global map changes, and memoize the element so it's re-used by react when possible.

I'm wonder if createElement can be replaced with some sort of memoizedCreateElement that will memoize the elements for users. This way users won't need to wrap elements that they return from the components prop in React.Memo, it will be done for them. In my example this would make <MemoRenderedInstance> unneeded. The memoization of the components map and of the function inside this map would probably still have to be done, but it could be documented as a thorny edge at least.

this might not actually be possible to create this memoizedCreateElement, I'm really not sure. But I think documenting this could be nice for users. Some thing like "If you use the components prop be aware that each and every parser run a new component will be returned. Memoize your component, components map, and components function, to prevent unnecessary re-renders"

wooorm commented 2 years ago

the components prop is invoked

I don’t understand? components is not a function I believe, and I see “invoke” as a synonym of “calling”?

React will destroy the old component, and replace it with the new component even if the element could be re-used.

I'm wonder if createElement can be replaced with some sort of memoizedCreateElement that will memoize the elements for users.

I believe you’re duplicating some part of what React is already supposed to do: React diffs a previous and a next tree (VDOMs), and applies the resulting patches to a DOM. The problem is not that two trees are created, that is intentional. The problem is that the diffing doesn‘t work: the old component should not be destroyed.

I recommend pulling the unified pipeline out of your components. It’s now recreated every time. It can be created once. Try do move all the code that doesn’t have to change out of components. In particular the useComponentMap looks like it’s creating new components every time. Lastly, if that doesn’t work, memoize them.

stevemk14ebr commented 2 years ago

The stringifier has to be within my components, because the things that make up the components map come from props. It would not work as a global because the necessary information isn't available. The main parser pipeline is already a global, it's created one time in a web worker and my components submit messages to the worker when necessary. My code example above omits the web worker logic for clarity, the workers set the transformed state variable.

You're right components is a map. asterFn is essentially a tiny React component that just returns my real components (since react functional components are just functions this works fine). When I say invoked I mean that this asterFn function is called, which does occur during stringification.

Lastly, if that doesn’t work, memoize them I do this (see <MemoRenderedInstance>), and it works, but this issue is about how much memoization is required for correct behavior.

wooorm commented 2 years ago

The stringifier has to be within my components, because the things that make up the components map come from props.

I do this (see <MemoRenderedInstance>), and it works, but this issue is about how much memoization is required for correct behavior.

I believe that there is a correlation between these two statements: you create components on demand, which means they can’t be diffed, which means they are slow, which can be solved by either a) not creating them each render, or b) memo-izing them. I believe this to be an “issue” introduced by how React is used, and hence something that also has to be solved there. I also don’t believe all createElement calls should be memo-ized, because in React, createElements aren’t memo-ized by default either. It is intentional.

Perhaps you can create an MVP reproduction that clearly shows a problem with only the bare minimum of code. But I’m closing this because I believe this is a choice that React makes.

stevemk14ebr commented 2 years ago

Ha, ok so you're right. Check this out

If I turn my asterFn into an actual react component at global scope (not sure why I didn't do this originally??) It works without the crazy memoization in the components map.

function AsterWrapper(props) {
    let { tag, value } = props;
    const renderMap = useSelector(selectComponentRenderMap);

    if (!value) {
        value = "";
    }

    if (!tag) {
        tag = "";
    }

    switch (tag) {
        case 'aster':
            const [type, id, renderId, ...restsplit] = value.split(".");
            if (!type || !id || !renderId || restsplit.length > 0) {
                break;
            }

            // boom a directive turns into a react element
            return (
                <MemoRenderedInstance type={type} id={id} renderId={renderId} renderMap={renderMap} />
            );
    }
    return (<span>{value}</span>);
}

const componentMap = { aster: AsterWrapper };

function Editor(props) {
    const componentMap = useComponentMap();
   /// things
   return (<MemoMainAsyncMarkdownContent components={componentMap} />)
}

It seems like the react Diff failed when it got to asterFn so all child elements, including RenderedInstance got unmounted then remounted every parse / stringifier pass.

wooorm commented 2 years ago

I saw this post air today, not sure how good it is but I think it touches on your problems: https://blog.bitsrc.io/understanding-referential-equality-in-react-a8fb3769be0