Add keyedLazy to scope styles for lists of lazy nodes

micahhahn commented 1 year ago

This PR aims to make lazy more performant when used with a list of many items.

Currently lazy is forced to render the styles immediately underneath the root node due to the way the VirtualDom works. This is usually fine, but in cases were we are rendering hundreds of elements with lazy we end up lots of duplicated identical styles. This causes performance issues in the browser as it has to do additional work to figure out which of the identical styles it needs to apply to any given node. (At NRI, we saw a list of ~900 elements with maybe 20 styles each take Chrome around 3 seconds to recalculate styles).

This PR mitigates the problem by scoping the generated css at each node with an id given to the root node. This takes the decision of which style to apply away from the browser and allows them to use id selector quick rejection.

We add new functions keyedLazy ... keyedLazy7 that look very similar to their lazy ... lazy7 counterparts save for that they admit specifying a key along with the root html element. See the type signature for keyedLazy below:

keyedLazy : (a -> ( String, Node msg )) -> a -> Node msg

In the library, we'll assign this key as an id on the root node and generate all styles with the id selector as a prefix.

Here's a video of the effect on FF on using normal lazy versus using keyedLazy on page from NRI that has ~900 elements using lazy with around 20 styles each.

https://user-images.githubusercontent.com/12899207/188202907-a5610638-3e39-4260-bf96-9f6cf8674791.mov

https://user-images.githubusercontent.com/12899207/188202926-69ddb056-90f0-4af1-8297-0b74741a429f.mov

rtfeldman commented 1 year ago

@tesk9 if you have time, I'd love to have your review on this too!

micahhahn commented 1 year ago

@tesk9

I think I might be misunderstanding something? I'm currently understanding that the slow thing that this PR is trying to resolve is the browser's application of the styles. Is that right?

Yes, exactly.

Here's a snapshot of my perf log from chrome for the writing portfolios page when using lazy on some 900 portfolios:

Chrome is spending almost 3s "Recalculating Styles"!!! I can't really peek deeper in the internals of what chrome is doing while recalculating styles but I assumed that it was that chrome has to weigh between 900 conflicting styles for each class name

I'm also wondering if there's missing memoization of shared styles in general, outside of the case where you have a container whose children happen to share styles.

Yeah agreed. I talked about this with Richard when we paired, but I think the ideal situation here would be to patch the Elm core VirtualDom lazy nodes to allow caching some additional JSON alongside the node. Then we could serialize our Dict of styles and get them back out instead of being forced to render the style node for each lazy child.

tesk9 commented 1 year ago

Hmm! I guess I'm seeing different behavior than you, then? On staging, with >200 writing pieces in the portfolio, I'm still seeing most time spent in scripting, and very little in rendering:

Are you seeing perf degradations when adding additional (all our apps are already wrapped with lazy helpers in Nri.Spa and Nri.Program) lazy wrappers? I wonder if Keyed nodes would prevent the re-renders you mentioned...?

micahhahn commented 1 year ago

@tesk9

Are you seeing perf degradations when adding additional (all our apps are already wrapped with lazy helpers in Nri.Spa and Nri.Program) lazy wrappers?

Yes! Sorry if that wasn't clear. On my branch we are adding a lazy wrapper around each portfolio item card. I should also note that the branch I'm looking at was before pagination (so literally 900 items on the page). Still applies though as we'd like to bump up the number of cards shown per page.

We were stuck in a weird place because without lazy on each card, hovering over a tooltip would force a re-render of all of them which cost a lot of time in scripting. Adding lazy to each item cost us instead a lot of time calculating css styles in the browser.

tesk9 commented 1 year ago

I was able to reproduce the slowdown in rendering in an Ellie!

When I had the top of the application use view = lazy view >> toUnstyled, I didn't see rendering perf hits, versus when I used model.list |> List.map (lazy2 viewTip model.openTooltip) |> div [], I definitely saw the problem you're talking about.

I guess I'm wondering whether Lazy is the right tool here, when working with a long list of items. I think that, if you're using lazy, you've got ~900 items in the cache, and they're all going to get invalidated when the open tooltip value changes.

Mind trying keyed nodes on your branch instead? I think it'll work. (I mean, I don't think perf will be perfect, but I think it'll be better)

Plain view Using lazy at the top of the view, does not spent a ton of time in rendering Using lazy2 right where the elements are rendered, does spend a ton of time in rendering Using Keyed nodes, extremely minimal time in rendering Using lazy at the top of the view with Keyed nodes

micahhahn commented 1 year ago

@tesk9

So cool you could repo this on Ellie!

I guess I'm wondering whether Lazy is the right tool here, when working with a long list of items. I think that, if you're using lazy, you've got ~900 items in the cache, and they're all going to get invalidated when the open tooltip value changes.

We're being a little more careful with the lazy args then your 3rd example such that only 1 list item is ever invalidated if the the open tooltip changes.

Here's an updated Ellie example that's closer to what we're doing

^ That Ellie example spends about 500ms for me in "Recalculate Style" which is roughly equivalent to what I'm seeing on my branch given that our list items are more much more complex.

Mind trying keyed nodes on your branch instead? I think it'll work. (I mean, I don't think perf will be perfect, but I think it'll be better)

I already am using keyed nodes on my branch - it helps some with the dom diffing time, but the full render of each portfolio item node is still not fast when done 900 times.

tesk9 commented 1 year ago

Okay! So then if I'm understanding correctly, ideally we'd want a node to have its children keyed in the usual sense and have those children lazily rendered?

I'm thinking about having, say, 1000 reorder-able elements on the page. I don't think I could use the Lazy.keyedLazy helper, since the browser still wouldn't actually have keyed the nodes for real. Is that right? I guess I'm a little concerned about user confusion around Lazy.lazy, Lazy.keyedLazy, and Keyed. 🤔

micahhahn commented 1 year ago

Okay! So then if I'm understanding correctly, ideally we'd want a node to have its children keyed in the usual sense and have those children lazily rendered?

Yes exactly!

I'm thinking about having, say, 1000 reorder-able elements on the page. I don't think I could use the Lazy.keyedLazy helper, since the browser still wouldn't actually have keyed the nodes for real. Is that right? I guess I'm a little concerned about user confusion around Lazy.lazy, Lazy.keyedLazy, and Keyed. 🤔

AFAIK you don't really need to use Keyed unless the list will have items being added or removed or swapped around. Lazy itself should still work without keyed... I'm 95% sure the runtime will just compare the existing virtual dom list against the new dom list pairwise element by element. Lazy the same way, save for that we get a pre-check against the arguments before rendering the new dom node and checking that.

Totally agreed w.r.t. the potential user confusion about the naming. Did you see my comment above that maybe we could go with scopedLazy ... scopedLazy7 instead of keyedLazy? I think that better communicates what's happening - that is that we're scoping the css to the node.

tesk9 commented 1 year ago

AFAIK you don't really need to use Keyed unless the list will have items being added or removed or swapped around.

Or you want to get some perf benefits! Ju's Performant Elm, Part II article

I guess my key question at this point is: can we combine Lazy and Keyed directly? Because there will be times when devs have long lists of elements that need to render performantly and are rearrangeable/deletable/etc.

It seems to me like the goal is something like this:

view : Model -> Html Msg
view model =
    model.list
        |> List.map
            (\id ->
                keyedLazy2 viewTip
                    (case model.openTooltip of
                        Just i ->
                            if i == id then
                                Just ()

                            else
                                Nothing

                        Nothing ->
                            Nothing
                    )
                    id
            )
        |> Keyed.node "div" []

Where keyedLazy2 : (a -> b -> (String, Html msg)) -> a -> b -> (String, Html msg)

The other thing I'm thinking about is the last time I last ran into lots and lots of elements and laggy performance that seemed tied to elm-css class name regeneration to some degree, I wasn't working with a list -- I was working with a tree. (Well, lots of trees, and that was the problem 😆)

#tree-root
   - #branch-1
       - #leaf-2
       - #branch-4
             - #leaf-5
       - #leaf-3
       - #leaf-4
   - #leaf-1
   - #branch-2
     - ...
   - #branch-3
        - ...

I think having an id-based descendant selector doesn't work as well for views where your perf issues are coming from depth, rather than breadth, you know? I guess I'm thinking that if we can get the browser to keep track of nodes, that's going to be better for us than if we try to manage when to apply css styles ourselves.

For the problem I was having with tree performance, we actually did end up scoping the styles ourselves. This is the reason that noredink-ui's Accordion has a separate styleAccordion helper, instead of having custom styling added in a more standard way.

I guess I would like for a baked-in elm-css strategy for perf improvements when there are lots of nodes to apply to other situations as well, like when nodes are rearrangeable or when there's a deep tree.

All that said: whatever you all decide to do next is cool with me! I learn conservative on lots of changes, but the best way to learn and to actually make improvements happen is to try things out.

Thanks for walking me through everything, Micah! I very much appreciate your taking the time to make sure I fully understood your problem and your goals 🙏

micahhahn commented 1 year ago

I guess my key question at this point is: can we combine Lazy and Keyed directly?

I've had this same line of thought! We likely always want both, and the both require a unique identifier - might as well have them be the same!

keyedLazy2 : (a -> b -> (String, Html msg)) -> a -> b -> (String, Html msg)

Unfortunately we cannot write this implementation. The VirtualDom lazy is an information black hole - all we get out is the completely opaque Html element so we could never recover the key.

We could do something like this:

keyedLazy2 : (a -> b -> Html msg) -> String -> a -> b -> (String, Html msg)

This is, in fact, what I had in an earlier phase of this API design. One downside is that it forces us to consume another argument to the core Html.lazy call so we could only have up to keyedLazy6.

Another potential thought along this line that I think I like better would be extending the Html.Keyed module. Right now we have this:

node : String -> List (Attribute msg) -> List ( String, Html msg ) -> Html msg

We can augment node to instead take the lazy function and data required:

lazyNode : String -> List (Attribute msg) -> (a -> Html msg) -> List ( String, a ) -> Html msg
lazyNode =
    Debug.todo ""

lazyNode2 : String -> List (Attribute msg) -> (a -> b -> Html msg) -> List ( String, ( a, b ) ) -> Html msg
lazyNode2 =
    Debug.todo ""

Curious for your thoughts on that API? It also has the nice side effect of avoiding the need to find a better name 😆

I think having an id-based descendant selector doesn't work as well for views where your perf issues are coming from depth, rather than breadth, you know? I guess I'm thinking that if we can get the browser to keep track of nodes, that's going to be better for us than if we try to manage when to apply css styles ourselves.

Ohhhhh fascinating. I had not even given a thought to tree based views. I think it could still help this scenario in a couple ways:

For lists, this PR takes the amount of "class matching work" the browser has to do in Recalculate Styles from n^2 to n (n nodes with only 1 matching class each). Assuming an average tree, each node would have log(n) parents so I think we would go from n^2 to n * log(n). The worst case is obviously still n^2 if the tree is only deep.
Note that we have a slightly different implementation for the root node vs all the other descendants of a keyedLazy call such that styles on the root node will not match anything else (the generated css looks like "#key._12345678" versus "#key ._12345678"). This helps in the tree scenario a little bit.
If we really wanted to support this use case more... we do have enough information to do something like "#key * ._12345678" if we want to pin down the generation. I'm a little hesitant as this could potentially reduce class sharing within a lazy node.

In the much longer term, I'm hopeful there's a better light at the end of tunnel.

If we could cache data on the lazy node itself and retrieve it then we would not be forced to generate a style node at each lazy call site and could avoid this problem all together.

Also, the recent episode of the elm-radio podcast mentioned an idea that we could use an elm-review like tool to traverse all the styles in the application and extract the ones that can be determined statically. This would significantly reduce our burden here and make asset sizes much smaller!

tesk9 commented 1 year ago

I like the Keyed.lazyNode idea, for sure! That looks very promising to me 👀

micahhahn commented 1 year ago

Ok! I wrote up an separate experiment PR that explore what the Keyed.lazyNode style API would look like! I think I like it.

@tesk9 If you could take a look that would be great! @rtfeldman Looping you back in too as this is a significant change to the API from when you might have last looked at it.

If we're all in agreement that the Keyed.lazyNode API is superior I'll merge it into this PR and we can get this over the finish line!

tesk9 commented 1 year ago

I am liking it better! Just curious, does it still need the id pieces? Also, should we test the new version to make sure it's performing as expected? (It's hard for me to keep track of when we're using referential equality for the cache checks and when we need a named function and so forth with lazy -- somehow I always end up confused!)

micahhahn commented 1 year ago

@tesk9

Just curious, does it still need the id pieces?

Yes, unfortunately. The id piece is essential for being able to scope the generated css so that the browser can reduce its Style Calculations.

I took the following profiles on Chrome and Firefox that might be helpful. These slices are the time it take to open a tooltip on the writing portfolio page with ~900 items.

Chrome

	Scripting (ms)	Style Rendering (ms)
1. keyed - not lazy	1,391	66
2. keyed + lazy	11	3,044
3. keyedLazyNode	15	572

FF

	Scripting (ms)	Style Rendering (ms)
1. keyed - not lazy	2,268	88
2. keyed + lazy	76	1,787
3. keyedLazyNode	76	25

The "keyed - not lazy" all show a high amount of time spent generating the view in javascript, but relatively little time for the browser to calculate styles as we're able to share the many of the css classes.

The "keyed + lazy" profiles drastically reduce the amount of time spent in javascript generating the view, but we are now generate duplicate styles on each and every node, and the browser spends forever applying all these conflicting styles to nodes.

Finally the "keyedLazyNode" profiles use the new functions. We still generate css on every lazy node, but it is scoped to that lazy's nodes ID which significantly cuts back on the time the browser spends calculating those conflicting styles.

Why is FF so much better than Chrome? 🤷‍♂️ But the improvement on Chrome is still drastic.

micahhahn commented 1 year ago

As the new branch is performing as expected and we like the API changes, I'm merging its changes into this branch.

micahhahn commented 1 year ago

@rtfeldman I think this is ready to merge if you like the changes!

rtfeldman commented 1 year ago

Cool! @tesk9 do you have time to take a look and give your thoughts? (No worries if not, but I want to double check!)

tesk9 commented 1 year ago

Thanks! I think I've said enough 😅

rtfeldman commented 1 year ago

Published as 18.0.0! 🎉

Thank you so much for all your excellent work on this @micahhahn, and thanks for the review @tesk9!

rtfeldman / elm-css

Add keyedLazy to scope styles for lists of lazy nodes #584

Chrome

FF