hoodiehq / camp

:circus_tent: Welcome to Hoodie Camp!
https://hoodie.camp
Apache License 2.0
99 stars 55 forks source link

Hoodie Community Dashboard #102

Open gr2m opened 7 years ago

gr2m commented 7 years ago

If you asked me:

How many active contributors does Hoodie have today?

I could not answer it. Nor could any other maintainer from any other Open Source project that I asked so far. And this is a problem, because Open Source Burnout is real and yet we don’t measure the underlying problems in ways we measure code quality.

What we don’t measure, we cannot improve.

The question about active contributors is only one aspect. What I am really interested in how well balanced the community is between active users, contributors and maintainers.

Goal

The goal for the Hoodie Community Dashboard is to be able to answer this question at all times, and make the underlying measurements transparent to everyone.

Out of scope: In future, I would also measure the success / impact of the Hoodie community which would include things like the reach we have, number of first-time open source contributors, diversity numbers etc.

Measurements

1. Work load

Measuring amount of users is hard for Open Source project, for good reason. But while it would be nice to know how many active users we have to measure the success of Hoodie, we are only interested in how much work load people produce that the Hoodie community has to take care of. Things that we can measure are

2. Active contributors

At Hoodie, we think contributions go beyond code and documentation. Equally important is the work from our editorial and design team, people helping answer questions in slack or on GitHub. In opposite to Load of users, we are not interested in amount of contributions, but in amount of different people who do the contributions, as we are not interested to have a few people do huge amounts of work, but in having a big group to balance the work load.

We can experiment with the details, but for a start I would define active as "contributed within the past month"

A contribution can be one of the following (from people who are part of the Contributors Team on GitHub)

3. Active maintainers

Traditionally maintainers are seen as gate keepers in Open Source projects, often times referred to as "committers". At Hoodie, we see maintainers being in charge to maintain and grow the space in which people enjoy becoming and staying an active contributor. Just like with contributors, we are less interested in the total amount of work by maintainers, and more interested in the total amount of active contributors.

Activities by maintainers are

Visualisations

To be done.

Basically I would love to see different charts, the main one showing the "community climate" indicator (or however we want to call it) over time.

I would like to add these visualisations to hoodie.camp (it currently is a simple prototype only showing open issues).

Besides having a website, I would like to be able to send out weekly and monthly reports via email

Feedback

We are actively discussing all aspects of the Hoodie dashboard and are very interested in your thoughts, questions and insights into existing tools or our experiences with other Open Source communities

hzoo commented 7 years ago

https://libraries.io/ has a lot of awesome stuff by @andrew which you probably know about.

1. Load by users

Stats like # issues/PRs should be easy to track via https://www.githubarchive.org/ or just the github API since it's pretty straightforward. Also big query, 2, and more.

Actually just looking at the winners/entries for the data challenge might give some more inspiration in general - https://github.com/blog/1864-third-annual-github-data-challenge. Not sure why there wasn't one last/this year though.

I know there was http://issuestats.com/ (might be down) that tracked avg time to close an issue/merge a PR + badges and a graph (won the github data challege previously).

screen shot 2016-12-21 at 8 30 51 pm

You can also track activity of 3rd party stuff (tweets with @hoodiehq), stackoverflow questions, slack messages (probably hard to measure if we aren't paying).

Probably not very many data points but other things like # of blog posts related, # of meetups/conferences/videos, # of talks.

2. Active contributors

Comments are really good - also active conversations via slack/twitter or other ways we do community engagement.

jasonLaster commented 7 years ago

Thanks @hzoo, I think it would be nice to share some Big Query APIs

We're also working on gathering contributor information in a google sheet as well. The sheet helps us keep track of the names and backgrounds of our contributors so that we can answer several maintainer questions:

We are also working on the right process for checking in on the community:

LappleApple commented 7 years ago

@gr2m Saw a tweet directing readers to this issue -- very exciting. I work in Berlin at Zalando (huge, publicly traded company) as open source evangelist. Would love to chat, but in the meantime wanted to share these links to some dashboards (apologies if they're already known to you);

I'm aware of other initiatives falling along these lines; happy to talk more.

dicortazar commented 7 years ago

Hi there, willing to help with metrics :)

andrew commented 7 years ago

Let me know if I can help pulling any statistics out of @librariesio for you

dicortazar commented 7 years ago

BTW, I run a small analysis of ten projects of HoodieHQ, you can have a look at http://cauldron.io/dashboards/hoodiehq . This is based on the tech we have at grimoirelab as referenced by @LappleApple .

You can see aggregated info per data source (Git, GitHub Issues and GitHub Pull Requests) and each panel provides several charts and tables. You can drill down, filter or even export the data and use your own viz.

Indeed you have info you already mention such as the number of open pull requests and issues, people involved in them, time to close those pull requests and issues, time zone distribution of commits and developers and others.

Hope this is useful!

nayafia commented 7 years ago

So happy you're doing this! A few more ideas:

avg. time until response

Maybe average time til an issue/PR is closed, as well? Basically tracking how long it takes to resolve. I think support departments often track this.

number of open issues number of open pull request

Also the number of opened issues/PRs, i.e. the rate at which they are being opened each week/month/whatever (which is more about growth).

number of new contributors

This is probably implied, but I'd also track the number of repeat contributors, the ratio of first-time to repeat, and how that changes over time.

Visualisations

If you haven't seen it, icecrime's vossibility project might be useful here.

LappleApple commented 7 years ago

+1 to @nayafia's ideas here. Nadia, are you saying that vossibility addresses your point? From the README I can't tell, exactly.

@icecrime, do you have other docs? (if not, let me know if you need help on that; I tend to follow this README template I created for Zalando.

nolanlawson commented 7 years ago

# of StackOverflow questions and average time until a StackOverflow response are also good things to track.

Dunno about Slack, but for IRC there's https://botbot.me/ which tracks logs and can be used to calculate messages per day (probably need to average it over number of users, though, because of bots).

gr2m commented 7 years ago

Cate mentioned foss-heartbeat by @sarahsharp

nayafia commented 7 years ago

@LappleApple I meant that vossibility might be a useful tool for creating dashboards/visualizations of any GitHub data collected. vossibility-collector has a bit more info in its README.

icecrime commented 7 years ago

Thanks for the ping!

Vossibility is a tool I created to help me manage the :whale: Docker open source project. I do wish it was easier to consume or use in other projects, it's mostly a matter of documentation :disappointed:

TL;DR: vossibility takes GitHub data, transforms it to extract the information you want and to enrich where necessary, sends all this to Elastic Search, and then you can use the wonderful Kibana as a frontend.

These are some examples of how I'm using vossibility today:

image

image

image

image

Happy to discuss more if this kind of information is helpful :+1:

dicortazar commented 7 years ago

Hey just to mention (do not want to spam! ^^) that grimoirelab supports the following data sources: askbot, bugzilla, confluence, discourse, gerrit, git, github issues, github pull requests, mbox, jenkins, jira, mediawiki, meetup, phabricator, pipermail, redmine, rss, stackexchange (stackoverflow), supybot, telegram, kitsune and remo. There's some extra info of Perceval, the retrieval tool at: https://github.com/GrimoireLab/perceval

That means, that having all of that information in a database (ElasticSearch mainly), we all can go for the metrics that you're mentioning such as people and evolution of contributors in all of those data sources, activity for all of those data sources, etc... And then on top of that, build more advanced analysis, such as the demographics of the community or some others.

btw, is any of you attending FOSDEM? that could be a great place to meet and discuss about metrics. We're also having this workshop to talk about metrics and show how to use the grimoire toolchain, just in case you're interested [http://grimoirelab.github.io/con/] and we also have this collaborative book https://jgbarah.gitbooks.io/grimoirelab-training/content/ where that's also detailed.

sagesharp commented 7 years ago

One of the things I would love to focus FOSS Heartbeat on is the people in open source communities. I think we often get caught up in metrics like "Is rate of merged pull requests increasing?" without focusing on the people behind those metrics. Examples of more people-focused questions I would love FOSS Heartbeat to answer are:

I'd love to chat more about this. If you're looking to hire contractors to work on these sort of people-focused metrics, you can drop a line to sharp@otter.technology. I'll also be at FOSDEM.

nayafia commented 7 years ago

@sarahsharp's excellent qs remind me...a lot of these metrics should be used to measure not just growth, but sustainability.

Ex. "average response time" can be used to measure how quickly maintainers respond to issues/PRs, but if it's decreasing over time, that can also be a sign of exhaustion. So the response isn't just "answer them faster!" but might be "how do we get additional :eyes:s and :hand:s to help out?"

mikeal commented 7 years ago

A few things that we pull regular metrics on in the Node.js project that have been important.

We used to track who was merging commits but that has gotten less useful over time because it doesn't really indicate who is reviewing commits as more PRs get reviewed by many people before being merged and it's common for someone to merge a bunch of already reviewed PRs. You can probably get better data out of the new review tools in GitHub if you're using that review system.

LappleApple commented 7 years ago

@dicortazar / @sarahsharp I'll be at FOSDEM too, as will some of my Zalando colleagues (@alexkops for sure, hopefully @hjacobs and our IP lawyer at minimum as well). Also cc'ing my colleagues @jbspeakr and @KathleenLD here so they can follow this thread; both are interested in/have exp in metrics and balance.

@nayafia Thank you for clarifying your point and for the extra link. @icecrime, would be up for adding some bits to your READMEs over the holidays.

mikeal commented 7 years ago

One last thing I'll say: think carefully about what is important to the project before building the dashboard.

There's plenty of data out there and it's easy to get lost in creating amazing visualizations of it. I've done this myself a few times and the result was more of a distraction than a benefit. There's also a bunch of products out there that already do this and I feel the same way about most of them as well.

I've actually paired back the data that I regularly consider. For instance, I no longer try to track the total number of commits in the main repo in master. There's an inflection point where the project can't handle any more activity in one place and things are spun off more liberally. If we obsessed about that metric we'd end up overloading that branch/repo. Instead, more focus is put on "how" the work is getting done rather than just the volume of work.

If you want to distribute the work load, attract more contributors, increase diversity, etc, a lot of these metrics won't help and can become counter-productive.

gr2m commented 7 years ago

@mikeal I very much agree. The Hoodie way to avoid this problem is to start with the end result without thinking about technical limitations or what data and tooling is available today. Then we will probably create some kind of dummy dashboard that just looks and feels amazing, then we all get super exciting about it, and then we make it work backwards :)

This is also the reason why I want @leighphan to lead this project, because she cares about the processes and the experience from the perspective of new and existing contributors as well as maintainers, and she has the skills for and interest in data visualsiations.

Thanks for this great discussion y’all <3 keep it coming

tracykteal commented 7 years ago

This is great & very important! @sarahsharp point about understanding the perceptions of people in the open source community is important for understanding the health & sustainability of the project, beyond the quantitative metrics.

@kariljordan just pointed me at this great paper from Steinmacher et al on self-efficacy towards OSS projects Increasing the Self-Efficacy of Newcomers to Open Source Software Projects. This study shows that self-efficacy (belief in one's ability to succeed in accomplishing a task) can increase with more guidance around initial commits. #win! The study doesn't then go on to show that people with more self-efficacy continue to contribute to the project, but more general studies in self-efficacy show that it's important for involvement in an activity.

It would be interesting to survey people new to and actively working on the projects, potentially with these survey questions, to understand why there might a particular balance between active users, contributors and maintainers on a project and if people are transitioning from being newcomers to active contributors.

Steinmacher et al survey questions:

  1. I feel comfortable asking for help from the community using electronic communication means
  2. I can write my doubts and understand answers in English
  3. I am good in understanding code written by other people
  4. I have pretty good skills to write and change code
  5. I feel comfortable with the process of contributing to an Open Source project
  6. I think that contributing to an open source software project is an interesting activity
  7. I feel I can set up and run an application if a set of instructions is properly given
  8. I am pretty good on searching for solutions and understanding technical issues by myself
  9. I can choose an adequate task to fix if a list of tasks is given
  10. I can find the piece of code that need to be fixed given a bug report presenting the issue
leighphan commented 7 years ago

Thanks everyone for your interest. I greatly appreciate all the tips on starting points! Looks like there are many different angles and ways we can extract and visualize the climate of Open Source communities.

@jasonLaster I'm curious - how do you get to know contributors better? Are there weekly chats/meetings welcome to everyone?

While project data can reveal efficiency and growth of OS projects, I'm also very interested in the data that will help us connect with people first in Open Source communities. Thanks @sarahsharp and @nayafia for bringing up very perceptive questions and indicators and questions about sustainability of people and (before) projects.

I'm all taking the Hoodie approach for starting with the interface look and feel, then working backward - people first. :)

LappleApple commented 7 years ago

Hey all, where are we on this thread? Some of us are talking about FOSDEM right now and it reminded me, there was talk of a FOSDEM get-together. Should we plan?

bkeepers commented 7 years ago

Hey all, where are we on this thread? Some of us are talking about FOSDEM right now and it reminded me, there was talk of a FOSDEM get-together. Should we plan?

👍 GitHub would be happy to host a dinner conversation.

LappleApple commented 7 years ago

👍 @bkeepers. Or does the group here want to take some sort of action, based on @gr2m's original pitch and the ideas that have followed since? I don't know the answer to this question, it's for everyone :)

jasonLaster commented 7 years ago

Thanks everyone for the wealth of information. I am inspired by all of the ongoing work.

Here's a quick doc that I started to summarize what I learned. Please feel free to improve it in anyway.

Also, if you're interested in joining a hangout, add your name to the doc and we can discuss next steps.

gr2m commented 7 years ago

@jasonLaster this is great work, thanks for putting it together 👍

nayafia commented 7 years ago

It would be interesting to survey people new to and actively working on the projects, potentially with these survey questions, to understand why there might a particular balance between active users, contributors and maintainers on a project and if people are transitioning from being newcomers to active contributors.

@tracykteal you might be interested in http://opensourcesurvey.org/, which is being conducted by GitHub. While not specific to any one project, some of those Steinmacher et al questions will be asked of respondents, so might be helpful as a baseline. The survey questions are public, and results will be public too.

clarkbw commented 7 years ago

Just getting back into the swing of things but @jasonLaster pointed me to this over the holidays and I was really excited to see all the enthusiasm and work going on. I've been looking at this from a couple different angles with very similar goals to what I've seen here so I'll share what I've been thinking.

First I want to understand the contributor funnel (funnel being a standard marketing term, perhaps not the best way to describe people; onboarding?)

Then I'd like to understand area of interest or strengths for contributors. (I think this is what @tracykteal is getting at)

Looking forward to more discussion! 🎉

leighphan commented 7 years ago

I'm also also curious how or if projects have an onboarding process (perhaps like Hoodie Camp) to get to know contributors' interests, such as building a portfolio, getting a job - this would help give a clearer idea of their trajectory. I started @codelaboc, a learning group in my community, and we conduct a survey for new members, to get an idea of their goals and strengths, and shape our events/direction to help each other.

@nayafia Any chance the http://opensourcesurvey.org/ will touch on such questions?

nayafia commented 7 years ago

@leighphan off the top of my head, I don't think so (it's geared more towards contributor behavior than project norms), but you can see the full set of questions here.

sohini-roy commented 7 years ago

Hello, Thanks for the guide. It was helpful. I am acquainted with GitHub API and am willing to be a part of this issue.

Please guide me through :)