Closed antoviaque closed 4 months ago
@nasthagiri @nedbat @regisb @idegtiarov Following-up on an action item I took from the last contributor meetup, I've converted this card from the core committer program board into an issue to be able to comment on it. My action item was to add a mention of including badges there, which I've added to the description.
Btw it could be worth starting to specify what we want for the leaderboard. Something like what the OpenStack project has, ie https://www.stackalytics.com/ ?
Thanks for assigning this to me @antoviaque! I'm keen to work on this.
I will take a look at this as well! Thanks for adding that ticket as a separate issue.
I am currently looking at the Discourse API documentation to fetch badge and user information. I would like to be able to fetch the following information:
This is relatively easy to achieve, but there needs to be a bridge between Discourse and Github. For this, we can use the Discourse "Associated Accounts" (https://discuss.openedx.org/u/regis/preferences/account). Once we make that connection, we can use the Github API to fill in the remaining information.
The only remaining field is the organization. I do not know yet how we can consistently associate a user to an organization. I would like to be able to list (at least) all organizations from the Open edX marketplace. Automatically finding the organization associated to a certain Github profile is imprecise and inconsistent. Thus, I think our best bet is to define a custom Discourse user field. This could either be a free-text field or a dropdown: https://discuss.openedx.org/admin/customize/user_fields @nedbat do you think this would be acceptable?
EDIT: I'd also like to display the organization continent, but I don't have a clean solution for this. Ideas?
I have made some progress on this. The idea is to generate a webpage that will display community members along with the number of likes received on the forums, the count of merged Github PRs, and other cool "vanity" metrics that show how engaged they are in the community.
What I had in mind was to parse the Discourse bio summary and to gather extra information via hashtags. For instance, here's what I'd put in my bio:
Principal Tutor maintainer. Open edX core committer. @regisb on Github. Fond of my beautiful mountain village in the French Alps. :ramen: Chinese noodle enthusiast. #overhangio #corecommitter
The "corecommitter" and "overhangio" hashtags will be associated to my profile. The link to Github will also be parsed and the @regisb account name will be associated too. This means that it should be possible to expose the following information via a REST API:
{
"username": "regis",
"forums": {
"likes_received": 223,
}
"github": {
"username": "regisb",
"pr": {
"merged_count": 104
}
},
"tags": ["corecommitter", "overhangio"]
}
Someone (else than me) will then be able to create a nice frontend where we can list and sort community members, search them by tags, etc.
Thoughts?
@regisb That sounds great! :)
One comment is that it might be useful to tie the data to a specific time period - to allow to show the number of PRs, likes,etc over a specific year/month. This would allow newcomers to be able to get to a better position faster, and encourage old-timers to keep contributing :)
FYI, on our side @symbolist might contribute some parts of this work -- though he would likely only become available from May.
@idegtiarov @regisb Still interested to also do a part of this work?
@idegtiarov @regisb Still interested to also do a part of this work?
Actually, I have already written most of the backend code. I just need to implement some caching to make sure that we don't crawl the Discourse API too frequently, while still guaranteeing that we have fresh results at all times.
@regisb Maybe we could develop this in the open so other people can help? :)
@nedbat Yes, but I wanted to get the code in a presentable state, first.
We are going to investigate Stackalitics service as a leaderboard option with one/couple of our internal repositories. The work is planned to start in April.
Here's my what I got so far: https://github.com/openedx/oxct It's hosted here: https://oxct.overhang.io/ (just leave a few minutes for the cache to warm up) I encourage everyone to contribute and open pull requests in this repo :hugs:
Adding to this thread, we already have an installation of the Grimoire Labs dashboard installed that I think can cover a bunch of the goals captured here.
It currently isn't public, but that should be easily enough done.
The project aims to implement the community metrics proposed by CHAOSS.
I was going to give Regis a "cooks tour" on a video call next week. If others are interested in joining, ping me on Slack?
We just came out of a conversation with @e0d who presented your Grimoire instance. It was really interesting, and I'd like to recap here a few points which are close to my heart:
@e0d Thanks for the presentation of Grimoire, that was really useful to see! I only knew it through Cauldron -- I had tried to run it on the edX github orgs some time ago, but it is a bit limited in the type of sources it can import there: https://cauldron.io/project/3820 . The setup you have seem much more powerful in that regard: https://openedx-metrics.herokuapp.com/ (CC @bradenmacdonald @nasthagiri as this might be useful to gather data about the core committer program, which you are looking at for a blog post about the program.)
Btw, would it be ok to post the recording of this meeting publicly here, in case others would like to watch it?
A few ideas/comments that I've found interesting from what you, @regisb @symbolist @idegtiarov @arbrandes mentioned, or reactions to the points you've made:
The idea of having a canonical database aggregating the contribution data, and then allow to develop & present multiple ways to represent that information seems like a great approach. As you mentioned @regisb, some will want to compare, some will want to get only a summary of their own contributions -- it's good to allow multiple perspectives, and this will allow us to experiment with how to look at the data over time, keeping the process iterative rather than define a single set of statistics once for all, which could be more easily gamed.
Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?
From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution"&_a=(description:'GitHub%20Pull%20Requests%20Overview%20panel%20by%20Bitergia',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:github_issues,key:labels,negate:!f,params:(query:open-source-contribution,type:phrase),type:phrase,value:open-source-contribution),query:(match:(labels:(query:open-source-contribution,type:phrase))))),fullScreenMode:!t,options:(darkTheme:!f,useMargins:!t),panels:!((gridData:(h:2,i:'1',w:5,x:0,y:4),id:github_pullrequests_main_metrics,panelIndex:'1',title:PRs,type:visualization,version:'6.1.0-3'),(gridData:(h:2,i:'2',w:7,x:0,y:0),id:github_pullrequests_pullrequests,panelIndex:'2',title:'Pull%20Requests%20by%20Status,%20over%20time',type:visualization,version:'6.1.0-3'),(gridData:(h:6,i:'5',w:12,x:0,y:12),id:github_pullrequests_repositories,panelIndex:'5',title:Repositories,type:visualization,version:'6.1.0-3'),(gridData:(h:6,i:'6',w:5,x:0,y:6),id:github_pullrequests_submitters,panelIndex:'6',title:Submitters,type:visualization,version:'6.1.0-3'),(gridData:(h:2,i:'8',w:7,x:0,y:2),id:github_pullrequests_submitters_evolutionary,panelIndex:'8',title:'Submitters,%20over%20time',type:visualization,version:'6.1.0-3'),(gridData:(h:4,i:'9',w:7,x:5,y:4),id:github_pullrequests_organizations_author_evolutionary,panelIndex:'9',title:'Pull%20Requests%20by%20Organization,%20over%20time',type:visualization,version:'6.1.0-3'),(gridData:(h:4,i:'10',w:5,x:7,y:0),id:github_pullrequests_organizations_author,panelIndex:'10',title:'Submitters%20by%20Organization',type:visualization,version:'6.1.0-3'),(gridData:(h:4,i:'12',w:7,x:5,y:8),id:'1f4ff210-740d-11e8-a4e7-6b1c6a13c58d',panelIndex:'12',type:visualization,version:'6.1.0-3')),query:(language:lucene,query:(query_string:(analyze_wildcard:!t,default_field:'',query:''))),timeRestore:!t,title:'GitHub%20Pull%20Requests',uiState:(P-1:(title:PRs),P-10:(title:'Submitters%20by%20Organization'),P-11:(title:Projects,vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-12:(vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-2:(title:'Pull%20Requests%20by%20Status,%20over%20time',vis:(legendOpen:!t)),P-5:(title:Repositories,vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-6:(title:Submitters,vis:(params:(sort:(columnIndex:!n,direction:!n)))),P-8:(title:'Submitters,%20over%20time',vis:(legendOpen:!f)),P-9:(title:'Pull%20Requests%20by%20Organization,%20over%20time')),viewMode:view)), most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.
+1 to "hours of effort" as one of the metrics we should try to capture. Like any metric, it will be imperfect, but it is indeed the one common "currency" we all wish we had more of, and the amount of our time that we spend on contributing to something is definitely representative of our level of implication on that project. It's also one of the main types of commitments from the Declaration of Commitment to the Core Committer Program, so it would allow tracking that more easily. Also, since many providers are tracking their time on their side too, it would allow comparing what the tool measures with what is independently measure, and check that the metrics actually match reality.
For community votes & karma -- we have some metrics on this through the "likes", which several of the tools we use readily support (discourse, github, etc.), and is already being used. Maybe getting that data and aggregating it too would be a good first step to measure karma?
More generally, community votes, nominations, etc. could be good to include as an additional source of information. Subjective opinions and votes are a useful complement to the rest of the data collected -- and could likely be made part of the dataset, too. However, I would be careful to not consider them necessarily more authoritative than the rest of the information -- a quiet developer who contributes a lot of work but doesn't talk much on the forums can be as (or more) important to the project as someone very visible and popular on the forums. Part of the goal with gathering the data is to contribute to dissipating perception bias and obfuscation, by showing actual numbers that reveal the actual work contributed -- if we consider this data secondary to popularity or visibility, this works against the meritocratic principles of open source imho.
Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.
+1 -- these might be things that we could be able to surface through tickets from bug reports, reports/likes on forums, maybe a role within the release working group? Badges are a good way too yes, maybe a stepped-up version of it could be a way to show the titles and responsibilities that any given person takes in the project?
I spent some time over the weekend deploying an upgraded instance of Grimoire Labs. It is currently consuming all of the data and I'll share a link once it's done.
[ { "conditions": [ { "field": "origin", "value": "https://github.com/edx/frontend-component-cookie-policy-banner" } ], "set_extra_fields": [ { "field": "my_namespace_foo", "value": "foo" }, { "field": "my_namespace_bar", "value": "bar" } ] } ]
I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.
This is a great idea!
I have been taking a deeper look at the CHAOSS project this week. To help others who would like to quickly understand what it is about so that they can participate in this discussion, I compiled together some highlights from my investigation here: https://openedx.atlassian.net/wiki/spaces/COMM/pages/2696446382/CHAOSS
Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?
I agree with this approach as well. It gives us a concrete starting point that has already been thought about deeply by many experts in the area and has been in use by other communities. We may want to additionally slice and dice the data for specific goals but the framework supports that as well (and so it does not constrain us). Also for the sake of thoroughness, I did try to see if there were any competing standards or options but this seems to be the only comprehensive one.
From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.
SortingHat is the part of the suite which is responsible for managing identities. From looking at its documentation it looks like it should support what we want and we just need to look into configuring that (looks like @e0d has already installed the user interface "hatstall" for that):
"Sorting Hat maintains an SQL database of unique identities of communities members across (potentially) many different sources. Identities corresponding to the same real person can be merged in the same unique identity with a unique uuid. For each unique identity, a profile can be defined, with the name and other data shown for the corresponding person by default.
In addition, each unique identity can be related to one or more affiliations, for different time periods. This will usually correspond to different organizations in which the person was employed during those time periods."
https://www.researchgate.net/publication/331088184_SortingHat_Wizardry_on_Software_Project_Members has some more details.
@e0d
The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.
Let me know if I can help with that. π
To also start the conversation about the overall plan, if everyone is in agreement about this as a starting point, the next steps could be:
I've made progress getting Grimoire upgraded and configured against the core data sources. An outstanding item is to configure authentication, which I can look at over the weekend. Without that it is not simply a matter of the data being available to everyone, but that anyone would be able to alter dashboards.
For CCs, I can send you a preview if Slack me directly.
An outstanding item is to configure authentication, which I can look at over the weekend.
Is that even possible? I though that authentication was only available in the commercial edition of Kibana?
Requiring login with a shared credential is possible, that's where we are right now. This is compatible with allowing readonly access to the views. This needs a little configuration change to work probably, but should be straight-forward.
PM me if you want the credentials to view the data.
@e0d Assuming we move forward with the instance of Grimoire that you have setup, what would be a good next step? Is it still with cleaning up the data & org associations? And would that be something that only you or someone at edX can do, or would the rest of the community be able to help here?
Happy to distribute via PM to in slack, I don't want to post publicly yet, though eventually being public for viewing is the goal. I'll send to your Slack handle.
Also, there was a recent release of Grimoire Labs, so I would like to find some time to upgrade: https://github.com/chaoss/grimoirelab/blob/master/releases/NEWS
And, the one of the CHAOSS folks did a presentation on Leaderboards recently at Tidelift's Upstream. I've been in touch with Georg and I think there's a chance to collaborate on something related to leaderboards. His talk is here:
@e0d Thanks! I could access it with the credentials you have sent. I'll see if one of the core committers from OpenCraft has time to look into this.
To double check, the next action would still be to improve & clean-up the data?
Based on the contributors call I had the sense that we are not yet aligned on whether a badging program, a leader board, or both are the best plan. What's the best way to align on a plan?
I suspect that cleaning up the data will be an iterative, hopefully not continuous, process. Maybe we should build something a POC and clean the data that we identify as most problematic during that process?
I have:
I haven't:
@e0d If we can make the leaderboard incorporates badges, then I'd like to go with the leaderboard, so that things can be counted/filtered/grouped automatically for us.
I like that we can issue badges manually to people who contribute more than just PRs, and so I wouldn't want to focus solely on github as a contribution source. But it does mean that we need to be conscientious about issuing badges -- maybe make that part of the job of the various working groups to nominate helpful people and regularly reward them?
I suspect that cleaning up the data will be an iterative, hopefully not continuous, process. Maybe we should build something a POC and clean the data that we identify as most problematic during that process?
π to this.
How can we help clean up the data?
@e0d We can definitely discuss more -- there wasn't a specific definitive solution that was agreed to I think.
The main point from the last meeting (which I only watched a recording of) that seemed to have reached consensus was that regardless of the way we want to present the data at the end, we need to collect it first in any case, and that that collected data should be open. And that CHAOS and Grimoire seemed a good starting point for a first iteration at that, since others have already done the job of figuring out lists of elements to measure, and built the software to do it. From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?
@antoviaque,
From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?
That sums up what I remember, yes.
@e0d @arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".
What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.
GitHub project sounds great.
I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals.
Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example
On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel @.***> wrote:
@e0d https://github.com/e0d @arbrandes https://github.com/arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".
What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866712217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ .
@e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?
One more thought, the merged orgs is a good example of the type of change that needs to be sticky. If we merge edX and edX inc. only for edX inc. to be recreated during the next identity analysis that an issue. I'm not yet sure where the two versions originated from. Do we need an aliases concept for orga?
On Wed, Jun 23, 2021, 12:31 PM Edward Zarecor @.***> wrote:
GitHub project sounds great.
I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals.
Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example
On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel @.***> wrote:
@e0d https://github.com/e0d @arbrandes https://github.com/arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".
What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866712217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ .
Currently it's every public project in the edX and Open edX GitHub orgs. We can add other repos if that makes sense. I think we need to work out that definition.
On Wed, Jun 23, 2021, 12:31 PM Jillian Vogel @.***> wrote:
@e0d https://github.com/e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866723603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAXHOALYCH6HXAOASELTUGZZXANCNFSM4VBPOKOQ .
@e0d
Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example
Can we export the number of contributions that are being counted against each non-org individual, so we can sort them and ensure the highest numbers are affiliated somewhere if that's appropriate?
But yes, the CC people by definition will have the most contributions, so I've updated the "Recommended Organization" for all the core contributors I could identify.
FYI I've created a github project to track these issues and ideas: https://github.com/orgs/edx/projects/6
Can people confirm they can edit those cards? I haven't converted any to proper issues yet, but I think that's what we have to do to allow comments.
@e0d I've created https://github.com/edx/open-edx-proposals/issues/226 as the first issue to address, so we start working on data cleanup without having to have access to the edX Grimoire/SortingHat instance.
If anyone has suggestions or something specific they'd like to see out of that task, let me know?
CC @arbrandes @regisb @antoviaque
@pomegranited Thank you! :+1:
https://github.com/orgs/edx/projects/6 Can people confirm they can edit those cards?
I confirm that I can edit them yes.
I confirm that I can edit them yes.
Same here.
Hi everyone, this issue hasn't been touched since June 2021. Was there any enthusiasm/capacity to pick up on this idea, or should we close the issue?
If we want to keep it I propose moving the issue to https://github.com/openedx/wg-coordination/issues since this issue doesn't pertain to an OEP.
As a contributor, I would like to see my achievements and compare myself with other contributors, in order to celebrate my wins and remain motivated for even more contributions.
To consider: