zachlankton / benchmark-slow-3g-first-interaction

1 stars 0 forks source link

whats the app we will make #1

Open PatrickJS opened 7 months ago

PatrickJS commented 7 months ago

whats the app we will use for all frameworks? TodoMVC?

zachlankton commented 7 months ago

Not even sure we need all of that, I just used the default app that exists when you create a new Quik project that has a counter widget.

I think we could standardize this to just be a simple counter widget for all frameworks and then benchmark the time it takes for the button to become interactive on slow networks.

Quik is unique in that it did have some interactivity right away and so that is how I was able to accomplish the timer because they wire up a global event listener that will then kick off a network request to get the rest of the JS to run the counter component.

Other frameworks will be tricky, because these buttons likely wont be interactive at all until after hydration. So a timer in this scenario wont work the same. We would have to continually click until it responded.

PatrickJS commented 7 months ago

well benchmarks will reflect real world usage. does a user continually click on something until it works?

zachlankton commented 7 months ago

That's good question ... in the real world? I may click a few times and give up. For benchmark purposes will need to devise a way to measure when the action performed finally produces whatever the expected result is.

In the quik example, I knew that an action had taken place because of how quik works under the hood so I could safely just wait for it to produce the result.

In next.js for example, it may render the button but not have any listeners, so clicking once will likely never do anything. Not sure until I test this tho... maybe even Next has some kind of "quik-like" early interaction.

PatrickJS commented 7 months ago

ok so benchmark is go to the page on slow 3g then wait until you see the button then click on it. then wait 10 seconds until you see the result

zachlankton commented 7 months ago

This could work, and if the framework does't produce a result, then that is the result of the benchmark?

PatrickJS commented 7 months ago

how long does it take to see the result. if no result happens in 10 seconds the user leaves so that means no result. N/A result

PatrickJS commented 7 months ago

a user will wait until they see something they interact with so I think it's fair to load the page and poll until you see the button then click and poll until you see the result

zachlankton commented 7 months ago

I started playing around with the puppeteer idea tonight and ran into some snags right away.

I was able to get throttling working with puppeteer but for some reason it would block all rendering until all the assets were preloaded, which of course made it impossible to measure because clicking the button after that works instantly.

So I ended up going with a docker compose stack with an nginx proxy in between the service under test and puppeteer and this actually works really well.

I've got a basic readme started in the my-quik-app folder

We should be able to replicate this setup for other framework tests relatively easily. Should be a mostly copy/pasta op.

PatrickJS commented 7 months ago

we need to add htmx and react

PatrickJS commented 7 months ago

yeah looking at your puppeteer file it looks great

PatrickJS commented 7 months ago

I don't think a human waits 2 seconds https://github.com/zachlankton/benchmark-slow-3g-first-interaction/blob/31c35ff466ea1d59ba0084000aec30a4ddadbe7f/my-qwik-app/puppeteer-test.js#L17 when I see a button I want to click I just click it right away

PatrickJS commented 7 months ago

ChatGPT on how long until a human decides to click.

In general, a typical reaction time for a web user might range from a few hundred milliseconds to a few seconds. For example, in usability studies, a common benchmark for simple reaction time to a visual stimulus is around 200-300 milliseconds. However, in the context of web browsing, this time might be longer due to factors like:

Decision making: The user needs to read and understand the button's function.
Attention: The user's focus might not be immediately on the button.
Design elements: The visibility and prominence of the button.

if we assume it's a cta or landing page yeah they're likely thinking about clicking before doing it. we can get two measurements of a user wants to click on something vs thinking before clicking. So the first case the user knows what they want and the second case the user doesn't yet

zachlankton commented 7 months ago

Lol, I actually timed myself. I started with my cursor at the far edge of my screen and kicked off a timer when the button became visible. I wasn't trying to speed run or anything, just casually moving to click and was taking a little more than a second.

That being said, I'm all for reducing this number as it may be a better metric to determine how soon the button is actually reactive by clicking it as soon as its visible or very very shortly after its visible like maybe 100 ms ?

PatrickJS commented 7 months ago

I think 1 second is fine tbh. anything shorter it's likely just a link

I'm working on the nextjs example

PatrickJS commented 7 months ago

can you fix the docker setup for next. and rerun the qwik test I updated the code to use the latest APIs

zachlankton commented 7 months ago

Wow, the Qwik update was a huge improvement! Can you explain ?

edit:

hmm just reviewed the updated code for qwik… am i understanding correctly that we are directly manipulating dom values?

is this the idiomatic recommended way to use qwik?

why not just use jquery. 😅

PatrickJS commented 7 months ago

yes, there are many different ways to solve the problem. if you want something interactive on a landing page they usually only make the <a> elements interactive. The counter example and others on the page was purposely meant to show the code is loading after interacting. you can see in demos they scroll down to it and show the network tab. in Qwik you can update the DOM and the framework won't break because it doesn't have the same hydration issues. if you did this in react or any hydrating framework it would throw a fat error

zachlankton commented 7 months ago

Sounds good, I think we may want to run both benchmarks and note the difference. This way the benchmark can also educate on how idiomatic code may not be the most performant as well as the code changes required in a given framework to acheive better results.

I just pushed a commit to update all 2 second delays into 1 second delays.

I'm going to take a look at the nextjs example, I'm curious if there is a way to get some of those same gains we saw in your updated qwik example.

PatrickJS commented 7 months ago

yeah sounds good. for nextjs the problem is that it requires you to download the framework and hydrate. you can try to jQuery it but you will run into a hydration error. For qwik the fact that you don't have hydration is why it works and allows for that. to make nextjs faster you have to insert a loading bar to prevent the user from clicking on it until the framework is loaded. you can also check out useSyncExternalStore which can help with the loading indicator part until the framework is download.

It will be worth it to check how much js is downloaded as another metric. for qwik you can optimize this even more but we can make another folder to show that. in qwik it is idiomatic to optimize code based on what you want the user to experience. the goal for it is a user-first rather than thinking client/server

zachlankton commented 7 months ago

PR https://github.com/zachlankton/benchmark-slow-3g-first-interaction/pull/4

If there is a will there is a way... 😎

Standard Idiomatic NextJS

Optimized using Direct DOM API

PatrickJS commented 7 months ago

haha nice!

PatrickJS commented 7 months ago

do you have to update the tests for qwik then? sense you change the results json etc

zachlankton commented 7 months ago

yeah, I figure its still early and we are figuring out what data we will collect and the format we put it in, but will definitely update the rest of the tests to be consistent.

PatrickJS commented 7 months ago

htmx looks good. I saw you added a 3rd metric should we update the others? whats the next framework?

zachlankton commented 7 months ago

Yeah, HTMX is an interesting case. With HTMX its a bit wild west, you can do anything you want. The power to render html first before all the js loads is up to you, so the first 2 tests reflect that. (Not sure how we would control this or test this in other frameworks.) The last test is our standard escape hatch example that we have been doing.

Not sure yet if we will see a pattern emerge here that we can apply to all frameworks. I'm still only running one test for Rezact, I could do an escape hatch version of that, but I'm starting to feel that those results are less interesting.

I think its clear already that we can do direct DOM and get excellent results with any framework. The part that interests me the most is how does that framework perform when using their standard starter examples (as this is the way most people will write code for that framework)

Thinking solidstart or sveltekit is next.

PatrickJS commented 7 months ago

standard starter examples doesn't make sense when htmx doesn't have any. you could go to the htmx examples page and grab one but it's the same for the other frameworks. the counter is not a real world example. If we have really good real world examples of things to build in each we should use that as the benchmark 👌

PatrickJS commented 7 months ago

maybe we can do a TodoMVC like example app for each

zachlankton commented 6 months ago

While I agree a more real world app may be the right approach, we may need to clarify what we plan to measure.

The benchmarks that I have seen using TodoMVC are centered around the performance of the client side framework after all the js is loaded... so there is no network to measure or any impact to performance on slow networks. (Other than it takes longer to load, but no one is measuring this currently that I'm aware of.)

I feel like whatever we plan to test here needs to have a network component like our original premise, such as how long till the button is rendered, how long does it take to respond when clicked.

If we were to build out a TodoMVC what would we test?

PatrickJS commented 6 months ago

yeah good point either way the test is just loading interaction. for any example app we can just ChatGPT generate tailwind version of any ui so we're not limited to TodoMVC

PatrickJS commented 6 months ago

We need to add Solid since they replay events

zachlankton commented 6 months ago

So.... I kinda let this project go while I was trying to think of the best way to move forward... I have a bunch of changes that I made to it from a week or 2 ago, and instead of trying to sort it all out I just YOLO pushed everything here.

Long story short, I reevaluated the testing procedure to work more like a human that would continuously click the button until it finally responded.

This begins immediately once the button is visible.

The tests now measure the amount of time and the number of clicks it takes before it responds, as well as any missed clicks (which represents if the clicks eventually "catch up" which should measure this idea of "Replay" I think?)

zachlankton commented 6 months ago

Oh and I added solid-start ...

[
  {
    "testName": "standard idiomatic solid-start",
    "timeToButtonVisible": 32.75194191932678,
    "timeUntiValueUpdated": 1047.5482330322266,
    "buttonClickedCount": 5,
    "finalCounterValue": 5,
    "missedClicks": 0,
    "failed": false
  }
]
PatrickJS commented 6 months ago

yeah it looks like only qwik and solid does replay. I think it's fair to make all the counters the same so also refactoring qwik