schell / mogwai

The minimalist, obvious, graphical, web application interface
426 stars 27 forks source link

Reduced string allocation #104

Open OvermindDL1 opened 1 year ago

OvermindDL1 commented 1 year ago

Would there be any interest in replacing a lot of the String usages with SmolStr instead, it would relieve a lot of memory pressure wins as a ton of strings (for example, attributes, styles, etc...) in mogwai(-dom) are less than 24 utf-8 bytes in length?

OvermindDL1 commented 1 year ago

For note, on the wasm side it will only save memory for things about 10 in length (because 32-bits, still worth it I say), I'm more curious on saving it on the server when I SSR it for hydration purposes, generating the SSR is dominating the time in the servers route processor.

OvermindDL1 commented 1 year ago

A quick test shows that though memory pressure is reduced, the extra conditional is adding enough overhead that offsets the performance boost from that for initial load, it only 'gains' for stored values over the longer time, needs more testing...

schell commented 1 year ago

I'm definitely glad you're looking into it! As it stands mogwai performs worst in the memory domain. I welcome any improvements, so long as they don't hamper ergonomics.

OvermindDL1 commented 1 year ago

Very unscientific, but using crates/mogwai-benches (not keyed, just normal, for maximum allocations happening) in this repo in the normal main branch and in my string_overhaul_testing (stot below for brevity, using the smartstring library) there are these differences I was able to see, each run was done in a new incognito instance to keep things clear:


File Size:

So it's about 4kb larger to add a more efficient string library.


Browser run times (ran multiple times for each in different orders, each of the given type were within ~1% of all the other runs of the same type so seemed consistent):

main: image

stot: image

The difference in times shows that stot is consistently slightly slower than main, but not by much. In my own project I was showing a greater amount of difference at ~2% slower for stot but I didn't have as much string manipulation as it seems this test has so I'm unsure...


Memory usage, as taken by a heap dump from within chrome after the mogwai-bench run has completed:

So stot does save a little bit of heap allocation amounts at the end of this benchmark, not a huge amount though but a few megs is still not trivial.


Performing records within chrome of the allocations that happen shows that all memory that wasm requests is allocated very quickly and then reused for the rest of the tests, so the rest is just using the wasm internal memory allocator, which is just the stock allocator in this library it seems. Let's try adding wee_alloc, which is common to use in wasm due to smaller filesize in exchange for slightly slower allocation speed and more inefficient allocations:


So yeah, as stated, unsure if it's really worth it, it is a measurable change but unsure if it's worth the added complexity, even if it would help with the allocation overhead in SSR that I'm measuring in my app.

OvermindDL1 commented 1 year ago

I wonder if just using a Cow<'static, str> would be sufficient to save a lot of this as 'most' of the strings are indeed just 'static str's... If I get time today I'll try to make a new branch to try that out.

schell commented 1 year ago

Thank you for the benchmarking! It's too bad it isn't an obvious easy win.

I wonder if just using a Cow<'static, str>...

That sounds like a great idea. I'm looking forward to any findings :)

OvermindDL1 commented 1 year ago

Didn't get time yesterday, busy at work, just looking into it this morning and I notice that non-static str's are able to be casted in (they are converted to Strings); since rust doesn't have specialization then that conversion would have to force 'static for strings, if non-static then they'd have to .to_string or whatever on it instead, so that would be a breaking change little though it may be... (which I personally think is a good change, better then hiding an allocation after all). I'm not even sure if that's even really used though, it doesn't seem to be in my code, I use either String or &'static str for mine...

OvermindDL1 commented 1 year ago

Okay, a basic conversion to using Cow<'static, str>, surprisingly didn't have to change any tests or anything like that. The run of the mogwai-benches: image

A heap snapshot took 84.2MB in chrome. Generated filesize is 304kb, so a little bit larger than just the basic mainline, but I don't think that's because of the Cow but rather I added some more From's for a few different things(these need to be cleaned up, lol).

Hmm, well no I take that back, I had to change one doctest, I had to remove a .into(), and that was all, so I guess it simplified it more? (I also changed a let (mut tx, rx) = broadcast::bounded(1.try_into().unwrap()); in a doctest to let (mut tx, rx) = broadcast::bounded(1); because it was bugging me, lol, there are more of those that need to be changed too though...) There were some other changes too because the cargo formatter changed some lines, unsure why they weren't formatted to begin with (different rust formatter version?).

The benchmark does do a lot of to_stringing though, fixing some of that would probably help it be better on that as well, but not changing it up yet.

OvermindDL1 commented 1 year ago

So yeah, again it seems within error bounds for speed, but it does reduce memory usage a bit, still unsure if its worth it though.

It did reduce my SSR generation time by ~8-9% though, which is worth it for me anyway for sure. ^.^

I only added Cow in one 'area' (the view), haven't looked into elsewhere yet so might have more speedups...

EDIT: I wonder if anyone's made a combined Cow<'static, ...> and smartstring/smolstr-like thing yet...

schell commented 1 year ago

It definitely seems that something like this is good to put on the roadmap.

OvermindDL1 commented 1 year ago

For reference with my SSR benchmarks here, here is a test with leptos of a trivial counter example (for simple) and that same counter put in a single div 10000 times (for many), and timed the various parts, these are the times I got (using the cow version of mogwai here, the mainline version is basically the same but about 8-9% slower):

simple_counter/mogwai   time:   [6.4647 µs 6.4842 µs 6.5034 µs]
simple_counter/leptos   time:   [7.1320 µs 7.1526 µs 7.1758 µs]
many_counter/mogwai     time:   [123.57 ms 124.15 ms 124.77 ms]
many_counter/leptos     time:   [77.771 ms 78.192 ms 78.643 ms]

parts_many-mogwai/view  time:   [31.956 ms 32.027 ms 32.099 ms]
parts_many-mogwai/generation
                        time:   [58.429 ms 58.926 ms 59.566 ms]
parts_many-mogwai/render
                        time:   [29.603 ms 29.684 ms 29.768 ms]
parts_many-leptos/view  time:   [26.477 ms 26.657 ms 26.841 ms]
parts_many-leptos/render
                        time:   [49.994 ms 50.909 ms 52.136 ms]

This is the mogwai part of timing the 'parts':

    {
        let mut g = c.benchmark_group("parts_many-mogwai");
        g.bench_function("view", |b| {
            b.to_async(&async_runtime)
                .iter(|| async move { mogwai_many_counter_view() })
        });
        g.bench_function("generation", |b| {
            b.to_async(&async_runtime).iter_custom(|iters| async move {
                let mut elapsed = Duration::default();
                for _ in 0..iters {
                    let view = mogwai_many_counter_view();
                    let start = Instant::now();
                    black_box(SsrDom::try_from(view).unwrap());
                    elapsed += start.elapsed();
                }
                elapsed
            })
        });
        g.bench_function("render", |b| {
            b.to_async(&async_runtime).iter_custom(|iters| async move {
                let view = mogwai_many_counter_view();
                let ssr = SsrDom::try_from(view).unwrap();
                let start = Instant::now();
                for _ in 0..iters {
                    black_box(ssr.html_string().await);
                }
                start.elapsed()
            })
        });
    }

And this is the mogwai simple counter:

pub fn simple_counter() -> ViewBuilder {
    let output_clicked = Output::<()>::default();
    let mut c = 0u32;

    html! {
        <div
            style="float:left;padding:1em;border-radius:.5em;border:1px solid #ddd;background:#f7f7f7;cursor:pointer"
            on:click=output_clicked.sink().contra_map(|_: JsDomEvent| ())
        >
            <p>
                {(
                    "Hello world!",
                    output_clicked.stream().map(move |()| {
                        c += 1;
                        match c {
                            0 => CowString::from("Hello world!"),
                            1 => CowString::from("Caught 1 click, click again 😀"),
                            clicks => format!("Caught {} clicks", clicks).into(),
                        }
                    })
                )}
            </p>
        </div>
    }
}
OvermindDL1 commented 1 year ago

Kind of wish a view could be rendered via a reference instead of it taking ownership of it in mogwai, would be convenient to static generate a set of them for faster rendering and manual patching for realtime updates without needing to rerun the view each time.

schell commented 10 months ago

@OvermindDL1 - I'm back on this - well the rendering from a reference portion. What type exactly are you referring to? Because SsrDom's html_string method takes &self.