Metrics for open-source teams/projects

abinoda commented 6 years ago

I am working on providing metrics for open-source projects that measure maintainer responsiveness, community engagement, etc.

Here are the list of metrics I've come up with so far. I am thinking that you could drill down into specific repos or aggregate over multiple at once. I would like to also provide README badges and charts that can be embedded for each of these metrics.

Merge Rate (merged PRs / all closed PRs)
% of PRs coming from within an org vs. outside
Total # of Contributors
Pull request response time (time between PR being opened and comment, label, or review from maintainer?)
Review response time / time to first review (time between PR being opened and first review?)
Issue response time (time for an org member to comment, label, or close an issue)
Engaged Users (number of users who have commented, reviewed, or pushed code in some way in the past 90 days)

I'm also curious to know whether folks are interested in metrics solely for supporting maintainers or also for displaying publicly to encourage visitors to adopt or contribute to your open-source projects.

7/2 update – initial priorities are listed below:

[ ] pr time to close (open -> close)
[ ] pr merge rate (merged / total)
[ ] pr maintainer response time
[ ] frequency of releases
[ ] issue time to close (open -> close)
[ ] issue maintainer response time
[ ] PRs merged (not exclusive to OSS)
[ ] Give each org a public page that where they can choose which repos and metrics to display
[ ] Create pages for non-Pull Reminders users and let them "claim" their page if they want

abinoda commented 6 years ago

@xinzweb @danrabbit @johannasmith @majormoses @terminalmage @gtmanfred @mbbroberg @kitsunet @damacus Would love any thoughts, ideas, or suggestions on this!

majormoses commented 6 years ago

Those sound like a good start to me, I think all metrics should be config driven to toggle them on and off as each project has different needs.

Merge Rate (merged PRs / all closed PRs)

:100:

% of PRs coming from within an org vs. outside

I'd certainly be interested to see and showcase the community engagement

Total # of Contributors

I imagine this would need to do a uniq on them if we are looking at across the org or a subset of repos, in the case of a single repo this metric

Review response time

I think you are right I think that the most useful metric is time to first review (or comment from a maintainer). While there are probably other metrics in this category I don't know if anything else would be useful as for external PRs its pretty common to see PRs open for a couple months while waiting on the submitter to come back to it.

Issue response time (time for an org member to comment, label, or close an issue)

In some orgs they do not use github issues and rely on issue trackers such as bugzilla, jira, etc. As there is no current functionality on reminding on issues I am not sure this should be added yet. While its a great metric to keep track of OSS community health and whether current maintainers are spread too thin I think it should be left out or at least disabled by default if there is not a tool that they can use to keep track of those better.

Engaged Users (number of users who have commented, reviewed, or pushed code in some way in the past 90 days)

That would be really great to see any chance we could say store a years (or more) worth of retention even if we are only storing quarterly data? I think it would be good to be able to notice patterns in project development cycles. For example sometimes the last quarter of the year is down significantly because more people are on holiday for the last month of the year. Aside from looking at seasonal patterns it could be used to determine how much decisions (such as requiring tests with PRs) that were made had a meaningful impact on the communities engagement.

It also might be a good idea to look over what this provides (either for integration or ideas): https://github.com/caarlos0/org-stats

abinoda commented 6 years ago

@majormoses Thanks for all the suggestions! Follow-up question, are you interested in metrics for solely for supporting maintainers? Or are you also interested in showcasing metrics publicly that may help new visitors adopt or contribute to your open-source projects?

majormoses commented 6 years ago

Follow-up question, are you interested in metrics for solely for supporting maintainers? Or are you also interested in showcasing metrics publicly that may help new visitors adopt or contribute to your open-source projects?

Interested in both but probably need to make both configurable which metrics are displayed on each.

abinoda commented 6 years ago

@mikejolley @rosswhitfield @AndreiSavici @ondrej-fabry @bkeepers Would love any thoughts, ideas, or suggestions on this!

gtmanfred commented 6 years ago

Follow-up question, are you interested in metrics for solely for supporting maintainers? Or are you also interested in showcasing metrics publicly that may help new visitors adopt or contribute to your open-source projects?

I would like them to be visible publically too.

All of the proposed metrics I think would be useful for us, but agree that they should be able to be turned off, based on if you do stuff like, not use github issues.

Also, it would be good to see some of these metrics on a per branch basis. We maintain 2-3 stable branches at a time, which have different speeds at which they receive PRs.

I used to use cauldron.io to get some of these metrics, but it isn't super reliable for being online and can be somewhat tedious to use.

Another metric I would like to see is Changes per branch over X number of days That would tell us how many lines had been merged and what percentage of the code base has changed recently. We merge somewhere 400 PRs/commits a month (i forget which metric it was), so that would be useful to be able to see at a glance.

Thanks, Daniel

terminalmage commented 6 years ago

I echo everything @gtmanfred said above. Configurability is a must, lest some of these become noise.

abinoda commented 6 years ago

@bootstraponline Would love your thoughts on this.

You've mentioned that pull request response time is the single most valuable metric for your use case since people want to know how quickly they'll get a code review.

Can you define what exactly "pull request response time" means in the context of open-source on GitHub? Is it the time between a PR being opened by someone and a maintainer responding to it in the form of a comment, label, or review?

bootstraponline commented 6 years ago

Review response time (time between PR being opened and first review?)

I like that metric. I agree about configuration options. Maybe start with one metric that's most relevant and expand from there.

It'd be disappointing to have a bunch of badges that no one uses.

bootstraponline commented 6 years ago

Can you define what exactly "pull request response time" means in the context of open-source on GitHub?

For us it's a bit complex as we have a multi-stage review process. There are different github labels:

waiting: contributor
waiting: product
waiting: gerrit import
waiting: CLA
waiting: code review
waiting: QA

Being able to tell how long we're in each label state would be helpful, in addition to how quickly feedback is provided in each phase.

Probably that's too much work so time to first maintainer comment is a more general case.

bootstraponline commented 6 years ago

https://k8s.devstats.cncf.io/d/4/blocked-prs-repository-groups?orgId=1

Devstats has a lot of cool visualizations around github metrics & PRs.

johannaratliff commented 6 years ago

I have a high priority for turnaround time on PRs, same as most of you, but I think it needs to be coupled with the MergeRate to be a valuable indicator of project health.

Pull request response time (time between PR being opened and comment, label, or review from maintainer?)

I would say starting with full turnaround time open --> close would be the first thing I want. Timing of more granular indications can come later, but first I need to know my average time for resolution (whether it's merge or rejection).

Above thoughts map 1:1 for what I want for issues as well.

As a step 2, what I'm most interested in is status of open or hanging PRs and issues. An average time to merge or close that seems quite healthy can give a skewed perspective if there are many open PRs and issues that have never been resolved. E.g. My github bot marked and labeled your PR so it looks like there's attention to it. Automated tests ran and it looks completely merge-able. To the submitter, they don't understand why it hasn't been merged and there's been no human contact to discuss. That feels unhealthy to me, but would be overlooked if we're only analyzing "time to label" and "time to close".

I'm also curious to know whether folks are interested in metrics solely for supporting maintainers or also for displaying publicly to encourage visitors to adopt or contribute to your open-source projects.

I want public metrics so users can assess ease of contribution to my project.

damacus commented 6 years ago

How about frequency of releases? It's all very well merging in code into master but if that code isn't released in a packaged format the users often won't feel the benefit.

Over at sous-chefs we're pretty quick at merging but often not quick at releasing those bug fixes. It'd be good to encourage more of that.

majormoses commented 6 years ago

How about frequency of releases? It's all very well merging in code into master but if that code isn't released in a packaged format the users often won't feel the benefit.

Over at sous-chefs we're pretty quick at merging but often not quick at releasing those bug fixes. It'd be good to encourage more of that.

I would very much love to see that as well, one thing that we started doing on the sensu-plugins is to force maintainers to release every functional change (PR). Prior to this (me becoming a maintainer for them) we had a lot of contributions that would sit around for sometimes almost a year (and one of the things that motivated me to become a maintainer for them).

xinzweb commented 6 years ago

Thanks for collecting the feedbacks.

Well, my goal is to focus on each repo and get the ball rolling forward, and remind people if any open issues/pr/review left behind.

There is only one metric I am interested, is the wait time of last activity. As long as the topics are hot and people are actively taking actions on them, I am pretty happy.

This can be applied to PRs or Issues.

Thanks a lot, Shin

kitsunet commented 6 years ago

I would like to add the amount of first time contributions. That's something I would want to know over time.

shs161 commented 5 years ago

I think identify inside and outside contributors is also important, as some research has shown that PR response time differs by type of contributors.

majormoses commented 5 years ago

I think identify inside and outside contributors is also important

How do you identify/classify an inside vs outside contributor? I am a member (and some cases owners) of several OSS projects that are associated with a commercial entity backing it but I do not generally collect any payment for these contributions. Due to the way github works on some OSS projects I generally invite maintainers (once they have proven themselves through several pull requests) into the org rather than as collaborators because you can't assign a team of collaborators read/write/admin permissions to a repository. For example I have an org https://github.com/sensu-plugins which contains plugins in various languages (ruby, python, golang, shell, powershell, erlang, C# etc) and will create teams such as ruby-plugins-commit-bit so that when we want to onboard a new maintainer (or when one is no longer interested in being a maintainer) it is quick and easy to grant or remove access.

While I certainly think that we can highlight github members of a project vs collaborators vs people without commit bit relatively easy enough but I think we need to be careful how we message it.

majormoses commented 5 years ago

might want to take a look at cauldron.io for some additional inspiration, here is an example))),refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-1y%2Fy,mode:quick,to:now-1y%2Fy))&_a=(filters:!(),options:(darkTheme:!f),panels:!((col:1,id:github_pullrequests_main_metrics_timing,panelIndex:31,row:1,size_x:4,size_y:1,title:Summary,type:visualization),(col:9,id:github_pullrequests_repositories,panelIndex:32,row:5,size_x:4,size_y:4,title:'Top%20Repositories',type:visualization),(col:10,id:github_pullrequests_status,panelIndex:33,row:1,size_x:3,size_y:2,title:Status,type:visualization),(col:1,id:github_pullrequests_last_pullrequests,panelIndex:34,row:9,size_x:12,size_y:4,title:'Recent%20Pull%20Requests',type:visualization),(col:1,id:github_pullrequests_submitters,panelIndex:35,row:2,size_x:4,size_y:7,title:'Top%20Submitters',type:visualization),(col:1,id:github_pullrequests_oldest_pullrequests,panelIndex:36,row:13,size_x:12,size_y:4,title:'Oldest%20Pull%20Requests',type:visualization),(col:5,id:github_pullrequests_80_percent_open_time,panelIndex:37,row:5,size_x:4,size_y:2,title:'80%20Percent%20Time%20Open%20(days)',type:visualization),(col:5,id:github_pullrequests_median_open_time,panelIndex:38,row:3,size_x:4,size_y:2,title:'Median%20Time%20Open%20(days)',type:visualization),(col:5,id:github_pullrequests_pullrequests,panelIndex:39,row:1,size_x:5,size_y:2,title:'Pull%20Requests',type:visualization),(col:5,id:github_pullrequests_submitters_evolutionary,panelIndex:40,row:7,size_x:4,size_y:2,title:Submitters,type:visualization),(col:10,id:github_pull_requests_author_domain,panelIndex:41,row:3,size_x:3,size_y:2,title:Domains,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'GitHub%20Pull%20Requests%20Timing',uiState:(P-21:(title:Repositories),P-3:(title:Summary),P-31:(title:Summary),P-32:(title:'Top%20Repositories',vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-33:(title:Status),P-34:(title:'Recent%20Pull%20Requests',vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-35:(title:'Top%20Submitters',vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-36:(title:'Oldest%20Pull%20Requests',vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-37:(title:'80%20Percent%20Time%20Open%20(days)',vis:(legendOpen:!f)),P-38:(title:'Median%20Time%20Open%20(days)',vis:(legendOpen:!f)),P-39:(title:'Pull%20Requests'),P-4:(title:Oldest),P-40:(title:Submitters,vis:(legendOpen:!f)),P-41:(title:Domains),P-5:(title:Latest),P-7:(title:Status)))) of what they give for free to OSS projects:

shs161 commented 5 years ago

I understand your point. However , the whole idea of social coding is to get contribution from people who are not core member of the project team. This becomes even more important when a for profit firm is engaged in open community ( or else why should they ). What can be some good ways to differentiate between a member of core team ( people who start the project or firm employees who work on a project ) and collaborators and just normal contributors. Is it not the case that some times core members of team ( insiders ) also create fork and send pull request to the project. In that case , how will you differentiate between community engagement and firm's own people working on their won project ? Please enlighten me. How can we differentiate between core members, collaborators , external contributors and external users ? For example one of the articles (https://www.researchgate.net/publication/265687645_Three_Metrics_to_Explore_the_Openness_of_GitHub_projects/download) discuss these four as different contribution categories. other article which also is hosted on github (https://github.com/pamelarussell/github-bioinformatics specifically suggest that "we use the term “outside contributors” to refer to commit authors who are never committers for the repository." Can you explain what are the other ways to identify external vs internal contributions. Specifically how will we measure community engagement ?

majormoses commented 5 years ago

It's certainly a complex problem and it requires understanding a bit about each projects workflows. There are various metrics that can be used such as email addresses in commits, github permissions, and such. Maybe by answering a few key questions we can use various metrics differently.

abinoda commented 5 years ago

Thanks for the thoughts both of you. I wanted to provide a quick update for folks on this thread. I've been busy the past few months with adding improvements to Pull Reminders, but I'm planning on dedicating time starting this month to getting a first version of this tool out the door. Thanks again to everyone for all the ideas generated here. I'm excited to start chipping away!

expjess commented 4 years ago

Hey @abinoda - would love to talk more about this if you still have thoughts around it simmering on your back burner.

As an open-source project, we have similar interests in everything that's been mentioned above. One big thing that we also keep bumping into is that, for an open-source project, issues are part project management (typical Github use case) and part user support. This means that we really itch for

a way to differentiate project management tasks vs user support
all the things that are typically baked in to support ticketing systems, such as
- all open issues awaiting a first response
- all open issues waiting on our team (and then being able to drill down by team member)
- metrics on issue response and resolution times
- touches per issue

pullreminders / backlog

Metrics for open-source teams/projects #53