Solution to tracking or non-tracking (keyed vs. non-keyed)

krausest commented 7 years ago

I'd like to ask for feedback to fix the tracking or non-tracking (keyed vs. non-keyed) issue. Please keep all general discussion out of this issue and put it into issue #22 instead.

Problem: Non tracking / non-keyed frameworks are faster than those which track / use keys. Regarding this benchmark non tracking is a correct solution. The "big" frameworks like react and angular can also perform much faster if track by if is disabled as can be seen here. The real problem is that for real life applications there's a big chance that non-tracking will cause wrong results (e.g. for transitions or dom manipulation outside of the framework).

Goal: Require all frameworks to use a tracking implementation.

Proposal: Add a simple functional requirement: The button to remove a row shall show a zoom transition when holding the mouse button. After releasing the button the transition must be stopped immediately, the next row must not show a zoom effect. This is a very simple change that just needs an update of the css (74e7436f4dfc649ef09e079f0a56cc30b6d97ef9) and might make the results a bit less biased and shouldn't influence the measurement via webdriver.

The following frameworks currently appear to be non-tracking :

aurelia
cycle.js
dio
domvm
inferno
riot
TSERS
vanillajs
vue

(please note that kivi is not included in that list)

What do you think?

solkimicreb commented 7 years ago

I think the style change you proposed is a pretty elegant solution. Did you already demo it? Sorry for my previous comment. I moved it to the other issue as requested.

leeoniya commented 7 years ago

I think the proposal is good. However it's important not to intermix the terms "benchmark" and "real life" in the same sentence. A keyed list of 10k rows is not real-life, it is simply a benchmark. Even paginating 1k rows is not terribly realistic, it would be shit UI. Most real life apps will not need enormous keyed lists. This distinction is also why a while back I asked about why any <16ms (1 frame) difference was counted as "insignificant". You're talking about a "real world" difference in rendering of an "unreal-world", synthetic bench UI.

As I mentioned in the other issue, there is "real life" UIs and then there are stress-test benches. And I think both are needed without presenting one as if it was also the other. IMO, a "real-life" demo would be a keyed list of a few hundred rows. And a "benchmark" of 10k keyed rows. In the former, <16ms should be counted as equivalent, but in the latter they should not.

It should be possible to look at the "benchmark" perf table and see which libs are fundamentally faster. This table will also help vdom authors find slow cases. But users can also look at another "real-world" perf table and see how often those differences actually matter. Few would choose kivi (its API is not nice) because it has the fastest keyed and unkeyed performance in synthetic tests if it only gained them < 16ms in real life.

/my $0.02

localvoid commented 7 years ago

Can we stop pretending that we are using this benchmark to find slow cases :) Those who've learned everything from this benchmark just trying to get better results with different tricks so they can use it as a marketing platform.

leeoniya commented 7 years ago

use it as a marketing platform.

Well, that's true of any benchmark ever.

Stress-test benchmarks are invaluable (to us authors) for finding pathological slowness within our own implementations. To no one's surprise, optimizing for a multitude of micro-benchmarks cumulatively does yield real-world benefit.

On one hand I dont want there to be "tricks" (really just spec ambiguities) to be employed by implementations. On the other hand I also don't want to see stress test benchmarks being passed off as real-world scenarios.

My argument is actually for all implementations to become unkeyed because that's all that is needed for this benchmark. Making the implementation keyed doesnt actually add anything except shuffle the winners around for no user-facing benefit. I'm saying that there should be one table where all impls are keyed, and another table where they are all unkeyed. Kivi may very well win both, but there would not be such huge losers in scenarios that are purely synthetic. There is room in this world for both purist and pragmatic approaches.

localvoid commented 7 years ago

Kivi may very well win both, but there would not be such huge losers in scenarios that are purely synthetic.

Recycling "optimization" will win in keyed scenario, because of "replace" test case :) And I hate this optimization, I would never enable it even to win benchmarks, this optimization has so much unsolvable edge-cases.

localvoid commented 7 years ago

Explicit event delegation also doesn't work when you try to build a real application, maybe it will be used in one or two list-like components, and that is all. If everyone really want to measure overhead of their libs, explicit event delegation also should be removed, otherwise it is just another trick to win benchmark.

leeoniya commented 7 years ago

I'm not sure I understand the points you're making.

This benchmark does a very specific set of things. Why should any implementation cater to things that it categorically does not do. If this was a real world implementation, most people would absolutely design it for speed (especially for a benchmark) rather than the accommodation of provably non-existent cases. Why would you frame any of these implementation choices as "tricks" or "optimizations"?

I understand from the way you write your libs that you prefer bulletproof solutions, which is great. But framing everyone else as cheaters because they found an optimization that doesnt account for unreachable code paths is pretty silly. Yes, if the requirements were different, event delegation may not be sufficient and recycling may introduce bugs, but that's not what we're given, that's a different "real-world" demo perhaps.

It also doesnt help that there's plenty of ambiguity here: "replace all rows: Duration for updating all 1000 rows of the table (with 5 warmup iterations)." Was this wording chosen because literally "replace all rows" is what was required or that's simply how React happened to work by default as it was the initial implementation?

Every implementation that has proper user-facing behavior is valid. They should all be "cheating" as much as the spec allows.

localvoid commented 7 years ago

Yes, if the requirements were different, event delegation may not be sufficient and recycling may introduce bugs, but that's not what we're given, that's a different "real-world" demo perhaps.

It is not about requirements, it is about what you actually want to measure as a library author. If you want measure how awesome are this optimizations, it is ok. I really don't care about such things, I even have a special version of kivi implementation for uibench that doesn't use any tricks, because when I care about actual performance, I want to see the real numbers instead of numbers achieved by useless tricks.

leeoniya commented 7 years ago

It is not about requirements, it is about what you actually want to measure as a library author.

lol what? this isnt your personal benchmark. it absolutely is about the requirements and nothing more. the only person who decides what should be measured is the benchmark/spec maintainer, not the impl authors. the reference impl is presumably vanillajs so anything it does to get the end result is fair game for all other impls.

would you say that the reference vanillajs impl doesnt measure "real numbers" and employes "useless tricks"?

krausest commented 7 years ago

Thanks for your comments. I've updated the css so you can check it.

I'm considering to add two simple benchmarks:

Cycle one row up / down.
Update the text of all rows.

On the other hand I consider dropping the clear rows a 2nd time benchmark. This one was accidental because react performed very bad in that case, but they fixed it.

localvoid commented 7 years ago

@krausest Don't forget to reimplement all other implementations that doesn't use recycling and explicit event delegation. For example, aggressive recycling for replace test case in React are usually implemented by reusing keys. Event delegation is also not so hard to implement. Who even wrote this stupid react implementation that allocates so many event handlers per each row, "most people would absolutely design it for speed (especially for a benchmark)" :)

leeoniya commented 7 years ago

https://github.com/krausest/js-framework-benchmark/pull/90

krausest commented 7 years ago

I suggest taking a look at one implementation before trying to change all of them. I picked react for that (because @localvoid found such warm words for my implementation ;-). A version with event delegation is here @localvoid Could you please check whether this is what you'd consider good enough. The onclick handler for selection and removal is now on tbody. But this requires adding some state to the dom (the ID is stored as data-id for each tr)

Here are the results for chrome 55.

Other changes in the benchmark:

Added Cycle up and Cycle down for 1k and 10k (not sure if both are interesting maybe I'll remove 10k if it doesn't reveal something interesting)
Create 1k was removed, there's just append 1k or 10k rows.
Instead of replacing all rows I added an "update all rows" operation.

Feel free to comment...

localvoid commented 7 years ago

@krausest the one that you've initially implemented was perfect and that is how most people would actually implement it. I just said it because "most people would absolutely design it for speed (especially for a benchmark)" :)

I really don't understand why it is so hard for others to implement it in the same "naive" way as initial React implementation. You can even remove kivi from this list, so they could win in this benchmark if it is so important, I don't care about this numbers. Don't want to waste any more time on this benchmarks.

localvoid commented 7 years ago

Cycle down is the worst case scenario for React. If you want to add worst case scenarios, then you should also include worst case scenarios for other libraries. But be careful, someone can start to think that you adding this test cases just because kivi can handle all worst cases :)

Atry commented 7 years ago

keyed vs. non-keyed describes internal implementation, which makes no sense for Binding.scala and other frameworks that do not use virtual-DOM. We should describe the behavior instead, e.g. replacing row vs. replacing content

krausest commented 7 years ago

Closing this one due to #104 @Atry There's no even a way to measure the effect of replace and removing rows. I still consider adding cycle down / up as an additional benchmark.

krausest / js-framework-benchmark

Solution to tracking or non-tracking (keyed vs. non-keyed) #89