tastejs / todomvc

Helping you select an MV* framework - Todo apps for React.js, Ember.js, Angular, and many more
http://todomvc.com
Other
28.56k stars 13.79k forks source link

Comparison chart #373

Open sindresorhus opened 11 years ago

sindresorhus commented 11 years ago

@addyosmani and I have been discussion this for a while, and it's something we would like to see happening in the near future.

There has already been done some awesome prototype work in #297 by @devinrhode2 and @schettino72 regarding Lines-of-Code measurement using Python. Even though I would prefer for us to stick with Node.js for simplicity, there is definitely a lot we can learn from that prototyping.

@jsoverson just released an awesome project named plato to visualize JavaScript source complexity. We should look into how we could make use of that.

Overview

I'm thinking a sortable table with a good amount of metrics, which should be accompanied by pretty graphs.

The data could be partly static, partly generated through a Grunt build. Stuff like license, description, etc could be static, while Gh-watchers, code complexity, etc, could be dynamic.

Goal

Further improve the MV* framework selection process.

Metrics

We need to figure out what kind of metrics that users care about.

My list is:

Thoughts?

addyosmani commented 11 years ago

I like your list so far. I think performance is perhaps one of the trickiest metrics because of the immense amount of structural variance between some of these frameworks. It's also difficult because it's not clear what should be measured. Creation/destruction of a new model and view? Does that offer anything of real value?

Also, would LOC be LOC of just the framework or LOC of the framework + deps?

Fantastic work on plato once again @jsoverson

sindresorhus commented 11 years ago

@addyosmani Performance is indeed the trickiest since it can't be generalized or clearly defined. How do you measure the difference between a framework that does lazy loading on each view compared to the one that loads everything up-front. It also probably differs with requirejs vs. non-require. If anyone has some good thoughts on this, we'd be happy to listen.

Regarding LOC. Dependencies is a hard thing to measure in size and LOC, since a dependency can be reused by other components. You could say the total size of Backbone is large since it requires jQuery and Underscore, but most likely a lot of your other components are using jQuery too, which means combining them wouldn't give you a true number. The correct number can't be programmatically measured.

// including some people that might have thoughts on all this: @igorminar @tomdale @tbranyen

devinrhode2 commented 11 years ago

What would be the various types?

Perhaps you should open another thread for what the normalized list of features would be.

For performance, I think it's important to take benchmarks using a local server. But this doesn't reflect network latency. Therefore, we should measure bytes sent over the network, and multiply it by a latency constant, and add this to a standard benchmark. We should measure this for the initial page load, and the second page load. (most apps could be screaming fast on the 2nd page load using html5 offline)

Also, for LOC, we want to measure the number of custom lines written by the authors to make the TodoMVC app. It's preferable most lines of code are in the framework and not written by hand. I care about the KB of my libraries, not exactly how many lines of code they are. The KB of libraries is reflected in the initial page load.

-Devin http://zerply.com/DevinRhode2 http://zerply.com/devinrhode2

On Thu, Jan 3, 2013 at 8:47 PM, Sindre Sorhus notifications@github.comwrote:

@addyosmani https://github.com/addyosmani and I have been discussion this for a while, and it's something we would like to see happening in the near future.

There has already been done some awesome prototype work in #297https://github.com/addyosmani/todomvc/issues/297by @devinrhode2 https://github.com/devinrhode2 and @schettino72https://github.com/schettino72regarding Lines-of-Code measurement using Python. Even though I would prefer for us to stick with Node.js for simplicity, there is definitely a lot we can learn from that prototyping.

@jsoverson https://github.com/jsoverson just releasedhttps://plus.google.com/u/0/108269395644621881233/posts/2ef48mu48exan awesome project named plato https://github.com/jsoverson/plato to visualize JavaScript source complexity. We should look into how we could make use of that. Overview

I'm thinking a sortable table with a good amount of metrics, which should be accompanied by pretty graphs.

The data could be partly static, partly generated through a Grunt build. Stuff like license, description, etc could be static, while Gh-watchers, code complexity, etc, could be dynamic. Goal

Further improve the MV* framework selection process. Metrics

We need to figure out what kind of metrics that users care about.

My list is:

  • Title
  • Description
  • Type
  • Features (should be normalized)
  • License
  • Size
  • Total size (with dependencies)
  • Dependencies
  • GitHub fork and stars
  • LOC
  • Cyclic code complexity
  • Performance (how? what kind?)
  • Time to completely loaded

Thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/addyosmani/todomvc/issues/373.

devinrhode2 commented 11 years ago

Benchmarking locally and then multiplying by a latency constant also can allow us to show what performance would be like on a really bad mobile connection.

schettino72 commented 11 years ago
devinrhode2 commented 11 years ago

+1 on the BigO notation, I'm not sure how we would measure this. It could be complimentary relative ms of execution time.

Also - performance will be biased by browser. For example, Ember will have much better IE performance than Angular. http://ci.testling.com could be used for free for cross browser tests and gathering performance data across different browsers.

addyosmani commented 11 years ago

I agree about metrics (at least where it makes sense) being broken down into framework and Todo app implementation.

On bigO: Interesting idea. I feel like someone may have already tried to do something similar for a few frameworks in the past - I'll try to dig it up. I see Types as either meaning architecture level type (e.g MVC, MVVM) or category type (as per how we break down apps on the homepage) - e.g JavaScript, Compile To JS and so on.

Also cc'ing @trek in case he has some input about this thread.

Side: (and I'm not sure how reliable these tests are, if at all), someone put together a few jsPerfs on this topic back in December.

sindresorhus commented 11 years ago

Benchmarking locally and then multiplying by a latency constant also can allow us to show what performance would be like on a really bad mobile connection.

Is there any node.js way or module that can simulate a flaky connection? PhantomJS?

i think the metrics should be more clearly separated into 2 sections (framework itself, and TODO implementation)

Makes sense.

performance: for me the most important performance is about complexity using bigO notation for some operations. For example I was looking into if the "ADD operation" would cause all the items to be redraw on the screen. This gives a much better idea if the application would scale than looking at how much time it took to run with 5 or 10 items.

I would say the most important performance metric to measure and also the hardest one is the performance perceived by the end-user. Which means we need to focus on some real-world usage measuring, rather than to measure each individual part in a vacuum. Not sure how to realistically do this though. Of course we'll never get it completely right, since we're basing it off a super simplified todo app...

I like the idea of the partly static data. it would be better to put this info in a separate file in json/yalm.

I was thinking YAML for ease of editing.

I completely understand you guys wish to sticky with node.js. I actually never used grunt but from what I could see IMO it is not good enough for the job. It doesnt even have a cache system, generating stats for all projects can be quite time consuming...

We'll only use grunt to trigger a custom task which handles all that.

jsoverson commented 11 years ago

Plato exposes all of its reports as json so you could run a report on every target project and then write a custom task to aggregate the overview data into a project list page with all the appropriate tables and graphs.

Might be a good test case for plato, too, so I might work on that in spare time anyway to see how it pans out.

tomdale commented 11 years ago

Having a comparison of things like license, frequency of commits, and file size sounds great. However, I have pretty serious reservations about trying to distill extremely nuanced things like performance and code complexity into a single number.

These types of things make people think they are making an informed choice, but just end up punishing projects that fall on the wrong side of the algorithm.

For example, AngularJS's mailing list is extremely active. Will that be taken into account? We are also about to launch some custom stuff for Ember.js discussion; would that be taken into account? Different projects utilize things like GitHub in different ways, so using that as a sole source of "popularity" or maintenance is extremely dangerous.

It may also have a negative effect on projects if they feel like they are being punished by these algorithms. For example, we currently ask users to file GitHub issues for features they would like to discuss. If TodoMVC conflates "many open issues" with "many bugs in the software" (or at least alludes to it by making it a point of comparison), we would essentially be forced to change our workflow and be more aggressive about closing issues that are not strictly bugs, and would be incentivized to aggressively close issues as fast as possible, rather than taking the time to solve it properly.

addyosmani commented 11 years ago

@tomdale raises some very important points above. He hits the dot once again on performance being difficult for us to reliably measure and we need to be careful there.

On code complexity, what is important for us to measure here? The code complexity of implementations or the complexity of frameworks? What do users actually care about at the end of the day? I would think more of the former as they're going to be consumers of frameworks, but others may feel differently ('hackability' is certainly a consideration).

Active discussion could certainly be another metric we take into account, but it's hard to measure outside of being just a boolean (e.g does having a lot of threads correlate to active discussion? what if those are users posting but not being responded to?). Beyond simply reviewing these communities periodically and manually stating whether they are active in discussions or not, I don't think we want to venture into outright measuring this numerically.

jsoverson commented 11 years ago

I think the benefit of TodoMVC is to strictly view the implementation of frameworks in a consistent manner and the TodoMVC implementations should be the ones under the most scrutiny.

Judging the internals of frameworks or their communities seems like something that will be fraught with question and complication without adding much definite value.

I mostly worry about these numbers or metrics leading to self fulfilling prophecies where TodoMVC steers people toward frameworks with high 'usability quotients' when those metrics are, in turn, partially determined by those users that have been steered toward the frameworks, ie "Everyone uses backbone because everyone uses backbone"

sindresorhus commented 11 years ago

@tomdale This is exactly why I mentioned you and others. I did not think about that. And that's exactly why we need a discussion about it.

All we want it to help people make an informed decision. Not to confuse them with false-fully generated data.

What kind of data would you liked shared about Ember to help users make a decision?

trek commented 11 years ago

Frankly, I have major reservations about this idea. I like that there's a central place to compare application style over a normalized feature set so people can see how each framework handles similar tasks but using this as a tool for any sort of benchmarking concerns me.

First, I have the same major problem as @jsoverson: self fulfilling prophecies.

Selected focus on specific items will steer developers towards focusing in those areas or risk stalling their community and potentially killing good projects just because they didn't "demo well."

Even the presence of a framework in TodoMVC implies something I think is fundamentally false: that all of these projects fit into the same slot in your utility belt and you should be selecting one. Most folks I talk to about frameworks feel absolutely paralyzed with indecision about which they should learn because they think we're in some battle royale to become the One True Framework Which Shines Brightest Above All Others.

I wish we had a better tool for helping people decide which framework is the best fit for their next project, not for the next stage of the career. I'm practically emo about this. Here is what an emo Trek looks like:

Down to the hard numbers: LoC, total size, startup time will all tell a story favoring the smaller frameworks because the single data point is a better use case for them. If the target application was a GMail clone it would tell very different (and equally biased) story.

I'm two blocks from a grocery store: walking is my best option. Better even the slow startup cost of a bicycle, which most people would think of as a microframework for locomotion. Driving would be ludicrous. Your total travel time would be massive from circling the block looking for parking. It would be irresponsible to use my single, very specific sample case point to compare the best way to fetch groceries.

We'd need a spectrum of examples before you could answer any meaningful question with the numbers.

sindresorhus commented 11 years ago

I wish we had a better tool for helping people decide which framework is the best fit for their next project

Do you think a wizard with questions about what you want to build and then showing some framework fitting the selected requirements would be helpful (or hurtful)? Maybe even at the end show some examples of real-world similar projects built using those frameworks.

I'm two blocks from a grocery store: walking is my best option. Better even the slow startup cost of a bicycle, which most people would think of as a microframework for locomotion. Driving would be ludicrous. Your total travel time would be massive from circling the block looking for parking. It would be irresponsible to use my single, very specific sample case point to compare the best way to fetch groceries.

Best metaphor ever :D

trek commented 11 years ago

Do you think a wizard with questions about what you want to build and then showing some framework fitting the selected requirements would be helpful (or hurtful)? Maybe even at the end show some examples of real-world similar projects built using those frameworks.

Maybe a paperclip. But fo' realz, that tool would be very subjective.

trek commented 11 years ago

I also wouldn't want contributors optimizing their code for speed and LoC counts. Then TodoMVC might be come a tool for comparing how convoluted you can make a project to get its line count down. Certainly interesting, but not the aim of the project.

trek commented 11 years ago

Or, ugh, even worse: framework developers baking in features specifically to get good performance/line count for a specific use case.

P.s. you should all try my new framework Todoz: for building lightning fast Todo list applications in under 20 lines.

addyosmani commented 11 years ago

P.s. you should all try my new framework Todoz: for building lightning fast Todo list applications in under 20 lines.

:smile:

In seriousness, I agree with your reservations about framework authors being forced to optimize for specific metrics just to appear favorable in our comparison - that would defeat the purpose of what TodoMVC is all about, simply enabling developers to compare implementation styles.

One could argue that on LoC there is also questionable value in us ranking frameworks by this. What if Backbone + it's deps were 5kb smaller than Angular or Ember was 10kb smaller than Spine and so on? If we do end up implementing a comparison chart we would need to make it clear that especially in any reasonably sized app, these numbers mean very little outside the context of lo-fi connections.

I remember someone putting together a wizard (of sorts) to help developers select a templating library based on a normalized set of criteria. At best, I think it allowed you to narrow down the options you explored in more depth and a tool like this would probably similarly be limited to offering something along those lines.

Most folks I talk to about frameworks feel absolutely paralyzed with indecision about which they should learn because they think we're in some battle royale to become the One True Framework Which Shines Brightest Above All Others.

Moving forward we're going to attempt to help them sail around this indecision with better categorization of frameworks and a clearer definition of what makes them different or where one works better than another or a set of specific purposes.

sechrest commented 11 years ago

Would it be helpful to use other third-party metrics, so that you can let others fight the standards /gaming standards process?

For example would using http://jscomplexity.org help the conversation?

Are there other comparable tools that would help frame the discussion? On Jan 3, 2013 6:47 PM, "Sindre Sorhus" notifications@github.com wrote:

@addyosmani https://github.com/addyosmani and I have been discussion this for a while, and it's something we would like to see happening in the near future.

There has already been done some awesome prototype work in #297https://github.com/addyosmani/todomvc/issues/297by @devinrhode2 https://github.com/devinrhode2 and @schettino72https://github.com/schettino72regarding Lines-of-Code measurement using Python. Even though I would prefer for us to stick with Node.js for simplicity, there is definitely a lot we can learn from that prototyping.

@jsoverson https://github.com/jsoverson just releasedhttps://plus.google.com/u/0/108269395644621881233/posts/2ef48mu48exan awesome project named plato https://github.com/jsoverson/plato to visualize JavaScript source complexity. We should look into how we could make use of that. Overview

I'm thinking a sortable table with a good amount of metrics, which should be accompanied by pretty graphs.

The data could be partly static, partly generated through a Grunt build. Stuff like license, description, etc could be static, while Gh-watchers, code complexity, etc, could be dynamic. Goal

Further improve the MV* framework selection process. Metrics

We need to figure out what kind of metrics that users care about.

My list is:

  • Title
  • Description
  • Type
  • Features (should be normalized)
  • License
  • Size
  • Total size (with dependencies)
  • Dependencies
  • GitHub fork and stars
  • LOC
  • Cyclic code complexity
  • Performance (how? what kind?)
  • Time to completely loaded

Thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/addyosmani/todomvc/issues/373.

tomdale commented 11 years ago

@sechrest I think things like JSComplexity are of extremely dubious quality, and just kick the can of responsibility down the road.

sechrest commented 11 years ago

Is there any attempts at quantification that are not dubious?

If yes, then which are most effective.

If not, what is the core barrier?

What common example is a reasonable starting point for comparison? On Jan 6, 2013 7:21 PM, "Tom Dale" notifications@github.com wrote:

@sechrest https://github.com/sechrest I think things like JSComplexity are of extremely dubious quality, and just kick the can of responsibility down the road.

— Reply to this email directly or view it on GitHubhttps://github.com/addyosmani/todomvc/issues/373#issuecomment-11939725.

tomdale commented 11 years ago

I believe that attempts at quantifying complexity are all dubious, because it seems like an NP-complete problem to do accurately.

schettino72 commented 11 years ago

Of course decisions on which framework to use should not be made on a single example (TodoMVC) nor based in simple metrics LOC, complexity...

But thats not a reason to not display them. Just put the raw data, a disclaimer that there are a lot of things that might be more important than these numbers and let the users decide.

IMO the point of TodoMVC is to compare the framework implementations. And the more stats it has the better it is :)

trek commented 11 years ago

It's exactly a reason not to display them. Selecting a group of criteria is tacit acknowledgment of their importance over other criteria. And that bears additionally weight coming from people considered experts in their field, disclaimer or not.

If you insist on putting up numbers that "don't mean anything, just for some fun" I would also like to know:

or any number of interesting but meaningless statistics.

Many of the proposed statistics will just shed a negative light on larger frameworks that are providing more functionality.

But,

only become meaningful when combined with an application of meaningful complexity since the goal of larger frameworks, whether accomplished or not, is to take on the sins of development so all may know everlasting deploys.

There's no escaping complexity. The only question is whether you're willing to take on that cost yourself or outsource it.

schettino72 commented 11 years ago

Selecting a group of criteria is tacit acknowledgment of their importance over other criteria.

True. Who is trying to "select" a group of criteria? Deciding to NOT show something is a pretty strong way of selection, better let each person decide which criteria should be selected out.

And that bears additionally weight coming from people considered experts in their field, disclaimer or not.

I think this comparison should target developers that can read a disclaimer, analyze some data and make an informed decision. Instead of thinking that we should "protect" the "normal" people by not showing information that we think they will misuse.

If you insist on putting up numbers that "don't mean anything, just for some fun"

Size, LOC, code complexity and loading time are pretty standard software metrics.

Many of the proposed statistics will just shed a negative light on larger frameworks that are providing more functionality.

Yes, more functionality doesn't come for free :) But how much this "more functionality" is costing?

only become meaningful when combined with an application of meaningful complexity

You may think these numbers are not meaningful, but that's just your opinion...

And not everyone is building an application "of meaningful complexity".

trek commented 11 years ago

True. Who is trying to "select" a group of criteria?

Um, we are? That's the whole discussion of this thread.

Size, LOC, code complexity and loading time are pretty standard software metrics.

decisions on which framework to use should not be ... based in simple metrics LOC, complexity.

So, basically, you've suggested standard metrics shouldn't be used to select a framework. Why are we discussing them? If this isn't information isn't helping people make a selection, why should it be shown?

But how much this "more functionality" is costing?

With a toy application we can't answer this question, so highlighting metrics about complexity is misleading

You may think these numbers are not meaningful, but that's just your opinion...

If we were just comparing libraries for the sake of comparison, sure. But TodoMVC is not. Specifically it is "Helping you select an MV* framework" and scoring some frameworks worse on metrics without putting that score into context is in direct opposition to this goal.

And not everyone is building an application "of meaningful complexity".

Right, which is why I think the project for anything more than stylistic comparison is misleading.

There are two, maybe three, distinct categories of tools that have been lumped together, then a target task was selected which is really only a good fit for one of them, and now we want to show numbers-based analysis? This isn't helping people select a framework appropriately, it's specifically guiding them towards smaller libraries.

schettino72 commented 11 years ago
> decisions on which framework to use should not be ... based in simple metrics LOC, complexity.
                                                    ^^^

I am not a native english speaker but this seems a quite misleading quote... in no way thats what I meant.

I will stop discussing and focus on getting some metrics measured :) If you guys will decide to publish it as part of the project or not is not up to me anyway...

trek commented 11 years ago

Misleading how? I removed one of two things you said we should not use for decisions to contrast it with the opposite statement about metrics you made earlier.

Of course decisions on which framework to use should not be made on a single example (TodoMVC) nor based in simple metrics LOC, complexity...

Even with both topics quoted, in doesn't change meaning: you've specifically said metrics should not be used to make this decision. If that's the case, what's your motivation for including them? If they're "standard" (as you say) but should not be used, what are you trying to communicate by including them?

schettino72 commented 11 years ago

https://github.com/schettino72/todomvc/tree/compare

The information needs to be audited :) I got the "routing" info from the official website.

Next step I will look into adding code complexity measurement.

intellix commented 11 years ago

I think something like this would be really helpful. At work, we needed to make a mobile website. My boss and I decided that we would use Sencha Touch 2 as they had just released a new version of Architect which looked awesome.

So anyway, went through learning and creating the mobile site and after about 3 months of creating it, and now having about 9 months experience using it I would say it's the most painful experience I've ever had with any sort of framework. Support is next to non-existent (even after paying $300), core features of the framework broke with each new version, documentation is lacking and sometimes just completely wrong, they hold back features/fixes so they can market them for major releases, performance is such a joke that my large-ish application takes something like 6 seconds to load and scrolling on Android with -webkit-3d-transform is agonising.

Anyway, my point is that their marketing far exceeds the actual product and if I would have known how bad it was up front with statistics like opening performance, general usage performance and how difficult it was to learn/create the app I would have steered well away. Architect makes it look like you're going to drag and drop your entire application within a week and I was completely wrong. So now I know how to create apps that take 6 seconds to show and not that easy to make.

I won't lie that I'll be wrongly swayed towards a framework with a better looking website and something more people are talking about... which could be entirely wrong :P

I think the issues you're talking about trek in regards to giving the wrong message are good points but I think there are too many factors involved to say that framework A is better than B because it is .2seconds quicker to load. I think if you're measuring down to the millisecond it sends the wrong message, but it would be nice to know that before choosing a particular framework/combo that A loads in 1 second and B loads in 3

devinrhode2 commented 11 years ago

I think at least it'd be useful to include gzip size.

For something like backbone, you could use zepto, jQ.Mobi, or jQuery, or jQuery 2.0. This could be succinctly displayed as 37-10kb (5kb + a jQuery library (32 - 5))

The key question here is What the minimum number of bytes necessary to use this library? Anyone concerned about this number is looking to load the page in as few bytes as possible, shared dependencies are irrelevant.

Library KB has a clear direct effect on load time, almost every js project displays it anyway.

johnstark commented 11 years ago

@schettino72 , do you have the generated results from /compare/comparison/compare.html posted somewhere? Is it possible to check in a copy? Thanks.

addyosmani commented 11 years ago

Tangentially related, but inspired by @intellix: what if we were to put together a site where developers could easily post their experiences with frameworks, grouped by framework name / category?

Nothing like a rating site, but time and time again I see people asking the "Should I use X? Should I use X over Y? Does anyone have experiences building large apps with X?" questions on SO, Quora etc.

Could be a terrible idea, but hey. Putting it out there.

sindresorhus commented 11 years ago

I think that is a great idea and would be much more valuable if done correctly. Like case studies for JS frameworks. It has to be comprehensive though.

devinrhode2 commented 11 years ago

:+1: On Jun 19, 2013 11:55 AM, "Sindre Sorhus" notifications@github.com wrote:

I actually think that is a great idea and would be much more valuable if done correctly. Like case studies for JS frameworks. It has to be comprehensive though.

— Reply to this email directly or view it on GitHubhttps://github.com/tastejs/todomvc/issues/373#issuecomment-19705605 .

trek commented 11 years ago

I'm down for this if we stratified per release (kind of like various App Stores do). Every lib/framework changes over time (hopefully improving!).

I know both Ember and Angular have had rocky patches that are largely resolved now. A ton of negative reaction to early, pre-released revisions shouldn't permanently depress a framework or library's overall rating.

One major downside to rating is that libraries, like any technology, tend to attract fanboys/girls: people who the library clicks with initially then will actively disparage other techs to justify their choices even if they haven't spent significant time experimenting elsewhere.

No idea how to stop this...

passy commented 11 years ago

That sounds like a great idea. Should there be some sort of approval process for the reviews? Does something like up and downvotes make sense or would that just lead to fans of one framework downvoting the others?

I agree with @trek that the site should clearly indicate which version of a library or framework is reviewed.

joeytwiddle commented 10 years ago

I agree quantitative metrics could easily be misleading, but we could distinguish frameworks according to features, workflow and their working mechanisms. For example we could categorize based on things like:

A wizard or search tool could then allow developers to narrow down the list, by discarding some frameworks that are unsuitable for their current project.

(However I would still put a big yellow disclaimer on any selection tool. Even some qualitative measures are arguable. For example a framework might be able to fulfill feature X only when a certain plugin is added.)

joeytwiddle commented 10 years ago

Regarding the idea of a review site, I often find a "pros and cons" list pair can help when a developer is sharing and comparing. Perhaps that method of review could be encouraged.

An alternative to creating a site to accept reviews from developers, would be to collect existing reviews and comparisons available on the web. For sure any review is going to be subjective, but reading both sides of the argument from two different reviews could give us a better picture.