tc39 / js-outreach-groups

62 stars 7 forks source link

How should we analyze feedback? #5

Open littledan opened 5 years ago

littledan commented 5 years ago

Discussed in the educators meeting in December, cc @bkardell

bkardell commented 5 years ago

...Or even how do we collect data of any kind and have some kind of agreement on what that generally might mean.

For instance, a lot of conversation is about what developers will 'get' or what they will find 'confusing'. Today, we can provide implementations through babel which really helps the economics of being part of the 'discussion'. People don't have to read theoretical code and imagine theoretical examples, they can touch them and try them in real use cases solving real problems and then, in theory, we should be able to get 'data' there and 'do science' to inform that argument away from just speculation. There might be some people who think 'developers are going to be very confused by this' but if we have a way to collect data/opinions better that doesn't have to persist.

I don't know what this looks like really, but it feels very worth thinking about. Do we have some kind of data?

littledan commented 5 years ago

Well, once we have things in Babel, then what do we do?

bkardell commented 5 years ago

No that's what i mean..How do we know and analyze what's happening once it's in Babel

hax commented 5 years ago

in theory, we should be able to get 'data' there and 'do science' to inform that argument away from just speculation.

Agree. But we should also note there are some footguns/confusions only occur in edge cases. It's not easy to find them by "data". And it's very possible to neglect them because "data" told you it's edge case. I think some edge cases can be ignored (if they can be avoid by some practical ways), some are not (if no tools can protect us or will have very big cost which even exceed the benefit of using the feature which may cause the footgun).

littledan commented 5 years ago

We spend tons and tons of time in TC39 theorizing about edge cases, how practical they are to avoid, how bad they are, etc. It's not clear to me how to combine this with real data. Certainly, I don't want to fall into the pattern of basically rejecting all data because "it didn't take footgun theorizing into account".

hax commented 5 years ago

@littledan I never mean reject data. Actually I want both. They are two sources of information to compensate for bias and possible shortcomings.

js-choi commented 5 years ago

One example of this question will happen for the pipeline proposals championed by @littledan. There are currently two front-running, competing proposals, each making different trade-offs. It’s difficult for anyone to weigh multiple interacting pros and cons and make decisions based on them, when using only their abstract reasoning. So @mAAdhaTTah and I are working on a Babel plugin that implements both proposals. Once it’s done, stakeholders could actually try out both proposals and see how they feel in concrete code. Hopefully, TC39 then would have more information to make an eventual decision regarding the proposals.

But, once the Babel plugin is implemented, what actually happens then? What kind of stakeholder data should be collected, and how should they be analyzed? Twitter threads, informal surveys, and opinion polls are easy to do but are coarse (measuring only superficial gut impressions) and come with many pitfalls (“opinion polls: potential for inaccuracy” and “total survey error” list several examples). But more-rigorous types of data are more difficult to obtain. For instance, some months ago, there was talk with @littledan and @codehag of running small A/B testing or some other sort of randomized controlled trial on programmer test subjects, in which they might write or read functions with different versions of the pipeline operator and perhaps the proposed :: operator by @zenparsing. But it would take time and resources to set such a trial up well. And, as with polls and surveys, it would be difficult to design the trial in a way such that its biases are minimized and that its results are actually usefully generalizable to the JavaScript community in general.

This question, of course, is broader than one mere operator. But that operator may end up being an interesting pilot case for new approaches. It’s all yet very preliminary, and the Babel plugin isn’t done yet, anyway. Just one example, in any case.

bkardell commented 5 years ago

@js-choi

But, once the Babel plugin is implemented, what actually happens then? What kind of stakeholder data should be collected, and how should they be analyzed?

Yes, exactly. I think that that is the next question for standards in general. it's an entirely new 'aspect' of standardization made possible by the idea of 'speculatively polyfilling' and transpiling proposals and allowing developers to 'participate' mostly by just 'using' and therefore helping the economics and depth. AB studies would be interesting, or even just workshops or meetups that review, but I think that the depth afforded by 'real use' is really important. Looking back on 'failed' standards there is often a lot of excitement for them. There are seemingly good arguments about why they were going to be so great and even early adopter mavens who seemed really happy about them. But then, once they start to get really used, in real life, by everyone... not so much.

At the end of the day, we'll only truly know whether something 'excels' or 'struggles' or even just outright fails 'in the large' when a fairly significant amount of people have actually spent time trying to use it to solve real world problems. Things like babel give us the ability to do that in a way we never could, in theory, if only we had data we could analyze and people to start figuring out how to actually do it.

Like, on pipeline, two things that came up were "developers won't get/like X" and "lodash is one of the popular ways in which you achieve a similar thing". So, I'm just kind of spitballing here but it seems like there are several metrics that could be interesting: First, just what is the uptake - do we even have a way to measure it? Can we? If so, I feel like there are plenty of other questions you could ask: Are projects that currently use that bit of lodash switching to one of the these over time? Are projects that started with one of these switching to another - and so on..

js-choi commented 5 years ago

AB studies would be interesting, or even just workshops or meetups that review, but I think that the depth afforded by 'real use' is really important.

Indeed, these approaches are not mutually exclusive. All of these approaches could be tried simultaneously, although they all would take months or even years to fully develop.

Of course, one big disadvantage of deploying competing proposals to real code is that only one proposal can be chosen while writing the code. This makes it difficult to actually compare proposals. It also can bias people’s initial impressions of the proposals based on what their organization chooses. And from the early user’s perspective, there is also the big risk that the particular proposal that they choose is not the one that will be eventually chosen by TC39.

Some of these disadvantages could be mitigated by code transformers that could convert the same code between the different proposals. Not only would this allow real code bases to quickly change proposals when TC39 makes its final decision, it would allow people to simultaneously read and compare versions of the same code using the different proposals.

But it still may be difficult to analyze even real-world experience from many heterogeneous sources and many heterogeneous code bases...

js-choi commented 5 years ago

See also @syg of @bloomberg’s 2018-07 discussion with TC39 about user testing for language design.

littledan commented 5 years ago

There's been some experimentation, especially in the class fields proposal driven by @hax and @Igmat, with writing articles or giving presentations, followed by a multiple choice poll about the preferred outcome. I've done a bunch of Twitter polls myself.

What kinds of things do you all think you can learn from this sort of data? Are there any best practices that we should recommend to people who are trying to collect this information? Or is its value inherently limited? How does this data compare with findings from in-depth, interactive discussions with people?

bkardell commented 5 years ago

FWIW the CSSWG occasionally uses this sort of thing for things that are mostly bikesheddy or debate gets into what developers will or won't "get" or "prefer". Members of the wg typically RT/share the poll giving it a bunch of fairly diverse eyeballs of people who follow CSS for diff reasons. We all realize it is not binding or scientific and sometimes there is nearly as much debate in crafting the question in the first place and interpreting the results. It's not perfect, but it is occasionally it is helpful in stemming off unproductive conversation paths and getting focus. I don't know about real 'best practices' but I think things that are helpful there are agreeing how to ask the question (and that it is a valuable question in the first place) and getting different kinds of people to share it.

I think it's just different data points really, valuable in different ways at different times. in-depth conversation is better in collecting actually 'constructive' feedback, and you can do it earlier too - but the sample is often small and kind of biased toward the people well prepared to discuss it which is probably not reflective of the larger community either. Finding ways to plug into early use measurement or something would also be different - measuring a broader scope of people and something more concrete, but available much later.

hax commented 5 years ago

@littledan Actually I believe @Igmat's and my presentations about class fields are much more "in-depth" than any other forms before. Especially, a big problem of class fields proposal is the "whole cost" of too many issues (even each issue may small), which simple articles/polls can never demonstrate.

littledan commented 5 years ago

I think it could be great to work together with people outside of the committee to refine the materials before distributing them more broadly. I have my issues with these particular presentations, which I think could be resolved if we work together on materials for something in the future.

hax commented 5 years ago

I think it could be great to work together with people outside of the committee to refine the materials before distributing them more broadly.

As I have stated before, I never want to give speech publicly about the "dark side" of any proposal, and I never did such thing before. The only reason I am doing it now is I believe TC39 process has big flaws on controversial issues and never possible to fix them. So I was forced to warn the community a broken proposal will be landed and you will face many footguns.

littledan commented 5 years ago

In this repository, I'm trying to improve exactly this issue about collecting community feedback and making sure the committee doesn't ignore it. In my opinion, we'll be more effective working collaboratively with champions, rather than presenting it as a challenge/"dark side".

Igmat commented 5 years ago

What kinds of things do you all think you can learn from this sort of data? Are there any best practices that we should recommend to people who are trying to collect this information? Or is its value inherently limited? How does this data compare with findings from in-depth, interactive discussions with people?

I want to expand idea that I've shared originally in https://github.com/tc39/proposal-class-fields/issues/203#issuecomment-451698992. The main part of it is these 3 questions:

  1. What is included in external feedback (opinions told directly to committee, github comments, twits, third-party polls, etc.)?
  2. How do we evaluate, prioratize and assess this feedback?
  3. What actions should be applied when committee's opinion isn't shared by external feedback after collecting and properly assessing it?

I would propose the following:

  1. Include:
    1. documented feedback told to committee members
    2. number of github users who support proposal (excluding committee members)
    3. number of github users who argue against proposal (excluding committee members)
    4. third-party polls *
  2. Order: polls > users who support proposal > documented feedback > users who argue against proposal
    • If feedback is more negative, then postpone moving proposal to the next stage
      • If alternative exists, then move it to stage 1 (with providing everything that is missing from list of Entrance Criteria) and collect external feedback for it
      • if there are no alternatives, then address community concerns in original proposal and collect new feedback, or create such alternative and move it to stage 1 this step could be repeated as many times as needed, if feedback stays negative after all ideas were tried, proposal should be rejected or delayed until there are new ideas
    • if feedback is more positive and there are no viable ways for addressing community issues, proceed with such proposal according to usual process

* - to make such polls valuable, article on top of which poll is handled should be accepted by proposal champions, but it doesn't mean that it should contain only positive proposal aspects, but rather fairly describe all its advantages/disadvantages. Actually only my mistake, that I haven't got an approval for my article about class-fields, lead to the fact of full ignorance of poll result as unvaluable. I might ask proposal champions for their feedback about fairness beforehand and improve it to satisfy both sides (proponents and opponents)

@littledan, what do you think about this?

littledan commented 5 years ago

@Igmat I'm not really a fan of this proposal. I'd like to collect feedback before TC39 advances far in stages, to help the committee have useful information make the decision, and then work from there. I don't think it will be very productive to have polls without coordination with the champion group, or pressure to demote stages. And I think we should use polls with caution, as we've been discussing in https://github.com/tc39/proposals/issues/156 .

I'm not interested in this repository becoming yet another place to talk about how awful TC39 and class fields is. We're trying to construct a solution to a problem, and being divisive like that just makes it harder. I'll start hiding comments as off-topic if the discussion goes in that direction.

Igmat commented 5 years ago

@littledan, sorry I didn't realize that mentioning that proposal offends you.

But it's not about that proposal at all. And I said that polls MUST be taken in coordination with champions in this quote:

  • to make such polls valuable, article on top of which poll is handled should be accepted by proposal champions, but it doesn't mean that it should contain only positive proposal aspects, but rather fairly describe all its advantages/disadvantages.

So article is prepared in coordination with champions, or even by champions. And it shouldn't published without champions approval, otherwise poll results won't be taken into account.

littledan commented 5 years ago

Right, so, I think champions will not be interested in polls whose goal is to move a feature back stages. Instead, I think we should work on outreach to help us determine whether proposals should move forward, and how proposals could solve developers' problems.

Igmat commented 5 years ago

or pressure to demote stages

I haven't proposed demoting at all, just delaying in some circumstances.

If feedback is more negative (which is VERY rare situation) then something wen wrong and we have to carefully weight every decision we've taken so far.

I think champions will not be interested in polls whose goal is to move a feature back stages.

No, I'm talking about article + poll pair, which goal isn't neither moving back or forwards. Such activity goal MUST NOT be "to prove somebody's opinion" (whether he/she is committee member or community), but rather to get OBJECTIVE numeric assessment of how would it be accepted in the wild. Does it make sense?

Igmat commented 5 years ago

I'd like to collect feedback before TC39 advances far in stages

Oh, I didn't mentioned it in my proposal, but I was talking about article+polls in stage 1/stage 2 phase and not stage 3, because it seems to be late to change something significantly, when implementers already started adopting such proposals.

littledan commented 5 years ago

I don't think we'll ever reach objectivity, either in the design itself or in our understanding of JavaScript developers' feelings and preferences. Articles and polling populations will always be influenced by the perspectives of those involved, as will any other mechanism of outreach. The best we can do is acknowledge that and do the best we can from there.

Igmat commented 5 years ago

Hm, I think that if both sides involved in research preparation it could be objective enough to reference it in further decision making process. But lets keep it aside for a while.

Do you have any other ideas how to involve larger community and guarantee them, that they'll be heard (it doesn't mean that something will be changed, but rather that opinion is really taken into account)?

To be more clear - my intention is to AVOID situations when committee and community wastes their time on useless debates, we all have similar goal - improve language design.

littledan commented 5 years ago

Yeah, it sounds like you're getting at a bunch of interesting questions that I don't know the answer to:

Igmat commented 5 years ago
  1. I thought that some kinda of established process, like I described above could be such guarantee, but probably it's too much? What are the boundaries for affect that community may have on committee?
  2. I guess consensus between champions and active community member(s) is enough to focus such researches. Don't you agree?
  3. I believe that following Scientific method while investigating controversial issues (like syntax aesthetic) as closely as possible (but I realize that our researches won't satisfy this method fully) with some kinds of polls/interviewing may greatly decrease amount of pointless discussion. But it'll require some discussion on how to handle such poll/interviews, but at first glance, it will take much less amount of time. What do you think?
Igmat commented 5 years ago
  • How do we convince people that their feedback is being used well, so that they will engage?

One of possible arguments to convince the community is appearance of somebody who represents community during committee's meet-ups. This person could probably present community's feedback to committee too. @littledan is it possible for you to be such person? And do you think that it makes any sense at all?

ljharb commented 5 years ago

Many delegates already serve that purpose, including the JS Foundation which is an entire member dedicated to that goal. Additionally, proposal champions typically all present this feedback when discussing their proposals. Can you elaborate on how your suggestion would be different, so we can get a better idea of what needs fixing with the current setup?

littledan commented 5 years ago

Right, I don't think there should be any one person who's the channel of community feedback; I'd say all TC39 delegates are trying to do right by JavaScript programmers. But I don't think @Igmat was implying otherwise.

If we want TC39 delegates present at meetups, maybe we should make a list of TC39 delegates who are interested in being reached out to for this purpose. I'm not sure whether the committee might be interested in maintaining such a list; I'll ask people at this TC39 meeting. If they're not, maybe we could maintain the list here.

hax commented 5 years ago

@littledan Is the list available now? It will also be a good list for invitation to our tech conferences :-)

littledan commented 5 years ago

cc @gesa