openedx / open-edx-proposals

Proposals for Open edX architecture, best practices and processes
http://open-edx-proposals.readthedocs.io/
Other
44 stars 31 forks source link

Leaderboard #179

Closed antoviaque closed 4 months ago

antoviaque commented 3 years ago

As a contributor, I would like to see my achievements and compare myself with other contributors, in order to celebrate my wins and remain motivated for even more contributions.

To consider:

antoviaque commented 3 years ago

@nasthagiri @nedbat @regisb @idegtiarov Following-up on an action item I took from the last contributor meetup, I've converted this card from the core committer program board into an issue to be able to comment on it. My action item was to add a mention of including badges there, which I've added to the description.

Btw it could be worth starting to specify what we want for the leaderboard. Something like what the OpenStack project has, ie https://www.stackalytics.com/ ?

regisb commented 3 years ago

Thanks for assigning this to me @antoviaque! I'm keen to work on this.

idegtiarov commented 3 years ago

I will take a look at this as well! Thanks for adding that ticket as a separate issue.

regisb commented 3 years ago

I am currently looking at the Discourse API documentation to fetch badge and user information. I would like to be able to fetch the following information:

This is relatively easy to achieve, but there needs to be a bridge between Discourse and Github. For this, we can use the Discourse "Associated Accounts" (https://discuss.openedx.org/u/regis/preferences/account). Once we make that connection, we can use the Github API to fill in the remaining information.

The only remaining field is the organization. I do not know yet how we can consistently associate a user to an organization. I would like to be able to list (at least) all organizations from the Open edX marketplace. Automatically finding the organization associated to a certain Github profile is imprecise and inconsistent. Thus, I think our best bet is to define a custom Discourse user field. This could either be a free-text field or a dropdown: https://discuss.openedx.org/admin/customize/user_fields @nedbat do you think this would be acceptable?

EDIT: I'd also like to display the organization continent, but I don't have a clean solution for this. Ideas?

regisb commented 3 years ago

I have made some progress on this. The idea is to generate a webpage that will display community members along with the number of likes received on the forums, the count of merged Github PRs, and other cool "vanity" metrics that show how engaged they are in the community.

What I had in mind was to parse the Discourse bio summary and to gather extra information via hashtags. For instance, here's what I'd put in my bio:

Principal Tutor maintainer. Open edX core committer. @regisb on Github. Fond of my beautiful mountain village in the French Alps. :ramen: Chinese noodle enthusiast. #overhangio #corecommitter

The "corecommitter" and "overhangio" hashtags will be associated to my profile. The link to Github will also be parsed and the @regisb account name will be associated too. This means that it should be possible to expose the following information via a REST API:

{
  "username": "regis",
  "forums": {
    "likes_received": 223,
  }
  "github": {
    "username": "regisb",
    "pr": {
      "merged_count": 104
    }
  },
  "tags": ["corecommitter", "overhangio"]
}

Someone (else than me) will then be able to create a nice frontend where we can list and sort community members, search them by tags, etc.

Thoughts?

antoviaque commented 3 years ago

@regisb That sounds great! :)

One comment is that it might be useful to tie the data to a specific time period - to allow to show the number of PRs, likes,etc over a specific year/month. This would allow newcomers to be able to get to a better position faster, and encourage old-timers to keep contributing :)

antoviaque commented 3 years ago

FYI, on our side @symbolist might contribute some parts of this work -- though he would likely only become available from May.

@idegtiarov @regisb Still interested to also do a part of this work?

regisb commented 3 years ago

@idegtiarov @regisb Still interested to also do a part of this work?

Actually, I have already written most of the backend code. I just need to implement some caching to make sure that we don't crawl the Discourse API too frequently, while still guaranteeing that we have fresh results at all times.

nedbat commented 3 years ago

@regisb Maybe we could develop this in the open so other people can help? :)

regisb commented 3 years ago

@nedbat Yes, but I wanted to get the code in a presentable state, first.

idegtiarov commented 3 years ago

We are going to investigate Stackalitics service as a leaderboard option with one/couple of our internal repositories. The work is planned to start in April.

regisb commented 3 years ago

Here's my what I got so far: https://github.com/openedx/oxct It's hosted here: https://oxct.overhang.io/ (just leave a few minutes for the cache to warm up) I encourage everyone to contribute and open pull requests in this repo :hugs:

e0d commented 3 years ago

Adding to this thread, we already have an installation of the Grimoire Labs dashboard installed that I think can cover a bunch of the goals captured here.

It currently isn't public, but that should be easily enough done.

The project aims to implement the community metrics proposed by CHAOSS.

I was going to give Regis a "cooks tour" on a video call next week. If others are interested in joining, ping me on Slack?

regisb commented 3 years ago

We just came out of a conversation with @e0d who presented your Grimoire instance. It was really interesting, and I'd like to recap here a few points which are close to my heart:

  1. Having lots of data, from different data sources (Discourse, Github, Slack...) is awesome. The fact that this data is centralized in a single data source (elasticsearch) makes it easy to create custom visualizations.
  2. Kibana is also great: it allows us to generate visualizations on-the-fly and to explore the data.
  3. I understand that some people love leaderboards, and thus that we need them, but we should also have a way to show off our contributions without necessarily comparing to each other. Thus, I would like to have a single page that says "RΓ©gis made X commits in the past year which fixed Y different bugs, received Z likes on the forums, etc." For me, both as an individual and an entrepreneur, this page would mean a lot more than a rank in a leaderboard.
  4. Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.
antoviaque commented 3 years ago

@e0d Thanks for the presentation of Grimoire, that was really useful to see! I only knew it through Cauldron -- I had tried to run it on the edX github orgs some time ago, but it is a bit limited in the type of sources it can import there: https://cauldron.io/project/3820 . The setup you have seem much more powerful in that regard: https://openedx-metrics.herokuapp.com/ (CC @bradenmacdonald @nasthagiri as this might be useful to gather data about the core committer program, which you are looking at for a blog post about the program.)

Btw, would it be ok to post the recording of this meeting publicly here, in case others would like to watch it?

A few ideas/comments that I've found interesting from what you, @regisb @symbolist @idegtiarov @arbrandes mentioned, or reactions to the points you've made:

Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.

+1 -- these might be things that we could be able to surface through tickets from bug reports, reports/likes on forums, maybe a role within the release working group? Badges are a good way too yes, maybe a stepped-up version of it could be a way to show the titles and responsibilities that any given person takes in the project?

e0d commented 3 years ago

I spent some time over the weekend deploying an upgraded instance of Grimoire Labs. It is currently consuming all of the data and I'll share a link once it's done.

[ { "conditions": [ { "field": "origin", "value": "https://github.com/edx/frontend-component-cookie-policy-banner" } ], "set_extra_fields": [ { "field": "my_namespace_foo", "value": "foo" }, { "field": "my_namespace_bar", "value": "bar" } ] } ]

regisb commented 3 years ago

I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.

This is a great idea!

e0d commented 3 years ago

Here are two examples of dashboards that are hosted by Bitergia for FINOS and Gitlab.

FINOS Gitlab

symbolist commented 3 years ago

I have been taking a deeper look at the CHAOSS project this week. To help others who would like to quickly understand what it is about so that they can participate in this discussion, I compiled together some highlights from my investigation here: https://openedx.atlassian.net/wiki/spaces/COMM/pages/2696446382/CHAOSS

Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?

I agree with this approach as well. It gives us a concrete starting point that has already been thought about deeply by many experts in the area and has been in use by other communities. We may want to additionally slice and dice the data for specific goals but the framework supports that as well (and so it does not constrain us). Also for the sake of thoroughness, I did try to see if there were any competing standards or options but this seems to be the only comprehensive one.

From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.

SortingHat is the part of the suite which is responsible for managing identities. From looking at its documentation it looks like it should support what we want and we just need to look into configuring that (looks like @e0d has already installed the user interface "hatstall" for that):

"Sorting Hat maintains an SQL database of unique identities of communities members across (potentially) many different sources. Identities corresponding to the same real person can be merged in the same unique identity with a unique uuid. For each unique identity, a profile can be defined, with the name and other data shown for the corresponding person by default.

In addition, each unique identity can be related to one or more affiliations, for different time periods. This will usually correspond to different organizations in which the person was employed during those time periods."

https://www.researchgate.net/publication/331088184_SortingHat_Wizardry_on_Software_Project_Members has some more details.

@e0d

The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.

Let me know if I can help with that. πŸ™‚

To also start the conversation about the overall plan, if everyone is in agreement about this as a starting point, the next steps could be:

  1. Make sure that the GrimoireLab instance is fully configured and ingesting data from all the sources it supports (happy to help with this).
  2. Give everyone a chance to play around with it.
  3. Gather recommendations about what initial set of metrics we should focus on for the CC program.
  4. Set up dashboards for them.
e0d commented 3 years ago

I've made progress getting Grimoire upgraded and configured against the core data sources. An outstanding item is to configure authentication, which I can look at over the weekend. Without that it is not simply a matter of the data being available to everyone, but that anyone would be able to alter dashboards.

For CCs, I can send you a preview if Slack me directly.

regisb commented 3 years ago

An outstanding item is to configure authentication, which I can look at over the weekend.

Is that even possible? I though that authentication was only available in the commercial edition of Kibana?

e0d commented 3 years ago

Requiring login with a shared credential is possible, that's where we are right now. This is compatible with allowing readonly access to the views. This needs a little configuration change to work probably, but should be straight-forward.

PM me if you want the credentials to view the data.

antoviaque commented 3 years ago

@e0d Assuming we move forward with the instance of Grimoire that you have setup, what would be a good next step? Is it still with cleaning up the data & org associations? And would that be something that only you or someone at edX can do, or would the rest of the community be able to help here?

e0d commented 3 years ago

Happy to distribute via PM to in slack, I don't want to post publicly yet, though eventually being public for viewing is the goal. I'll send to your Slack handle.

e0d commented 3 years ago

Also, there was a recent release of Grimoire Labs, so I would like to find some time to upgrade: https://github.com/chaoss/grimoirelab/blob/master/releases/NEWS

And, the one of the CHAOSS folks did a presentation on Leaderboards recently at Tidelift's Upstream. I've been in touch with Georg and I think there's a chance to collaborate on something related to leaderboards. His talk is here:

https://explore.tidelift.com/upstream/main/session-georg-link?__hstc=151926246.9e26273527f66b374c4fd3f59ee3767a.1616706952989.1623080365801.1623084160879.99&__hssc=151926246.1.1623084160879&__hsfp=436891838

antoviaque commented 3 years ago

@e0d Thanks! I could access it with the credentials you have sent. I'll see if one of the core committers from OpenCraft has time to look into this.

To double check, the next action would still be to improve & clean-up the data?

e0d commented 3 years ago

Based on the contributors call I had the sense that we are not yet aligned on whether a badging program, a leader board, or both are the best plan. What's the best way to align on a plan?

I suspect that cleaning up the data will be an iterative, hopefully not continuous, process. Maybe we should build something a POC and clean the data that we identify as most problematic during that process?

I have:

I haven't:

pomegranited commented 3 years ago

@e0d If we can make the leaderboard incorporates badges, then I'd like to go with the leaderboard, so that things can be counted/filtered/grouped automatically for us.

I like that we can issue badges manually to people who contribute more than just PRs, and so I wouldn't want to focus solely on github as a contribution source. But it does mean that we need to be conscientious about issuing badges -- maybe make that part of the job of the various working groups to nominate helpful people and regularly reward them?

I suspect that cleaning up the data will be an iterative, hopefully not continuous, process. Maybe we should build something a POC and clean the data that we identify as most problematic during that process?

πŸ‘ to this.

How can we help clean up the data?

antoviaque commented 3 years ago

@e0d We can definitely discuss more -- there wasn't a specific definitive solution that was agreed to I think.

The main point from the last meeting (which I only watched a recording of) that seemed to have reached consensus was that regardless of the way we want to present the data at the end, we need to collect it first in any case, and that that collected data should be open. And that CHAOS and Grimoire seemed a good starting point for a first iteration at that, since others have already done the job of figuring out lists of elements to measure, and built the software to do it. From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?

arbrandes commented 3 years ago

@antoviaque,

From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?

That sums up what I remember, yes.

pomegranited commented 3 years ago

@e0d @arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".

What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.

e0d commented 3 years ago

GitHub project sounds great.

I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals.

Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example

On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel @.***> wrote:

@e0d https://github.com/e0d @arbrandes https://github.com/arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".

What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866712217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ .

pomegranited commented 3 years ago

@e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?

e0d commented 3 years ago

One more thought, the merged orgs is a good example of the type of change that needs to be sticky. If we merge edX and edX inc. only for edX inc. to be recreated during the next identity analysis that an issue. I'm not yet sure where the two versions originated from. Do we need an aliases concept for orga?

On Wed, Jun 23, 2021, 12:31 PM Edward Zarecor @.***> wrote:

GitHub project sounds great.

I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals.

Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example

On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel @.***> wrote:

@e0d https://github.com/e0d @arbrandes https://github.com/arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".

What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866712217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ .

e0d commented 3 years ago

Currently it's every public project in the edX and Open edX GitHub orgs. We can add other repos if that makes sense. I think we need to work out that definition.

On Wed, Jun 23, 2021, 12:31 PM Jillian Vogel @.***> wrote:

@e0d https://github.com/e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edx/open-edx-proposals/issues/179#issuecomment-866723603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWEAXHOALYCH6HXAOASELTUGZZXANCNFSM4VBPOKOQ .

pomegranited commented 3 years ago

@e0d

Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example

Can we export the number of contributions that are being counted against each non-org individual, so we can sort them and ensure the highest numbers are affiliated somewhere if that's appropriate?

But yes, the CC people by definition will have the most contributions, so I've updated the "Recommended Organization" for all the core contributors I could identify.

pomegranited commented 3 years ago

FYI I've created a github project to track these issues and ideas: https://github.com/orgs/edx/projects/6

Can people confirm they can edit those cards? I haven't converted any to proper issues yet, but I think that's what we have to do to allow comments.

pomegranited commented 3 years ago

@e0d I've created https://github.com/edx/open-edx-proposals/issues/226 as the first issue to address, so we start working on data cleanup without having to have access to the edX Grimoire/SortingHat instance.

If anyone has suggestions or something specific they'd like to see out of that task, let me know?

CC @arbrandes @regisb @antoviaque

antoviaque commented 3 years ago

@pomegranited Thank you! :+1:

https://github.com/orgs/edx/projects/6 Can people confirm they can edit those cards?

I confirm that I can edit them yes.

arbrandes commented 3 years ago

I confirm that I can edit them yes.

Same here.

sarina commented 4 months ago

Hi everyone, this issue hasn't been touched since June 2021. Was there any enthusiasm/capacity to pick up on this idea, or should we close the issue?

If we want to keep it I propose moving the issue to https://github.com/openedx/wg-coordination/issues since this issue doesn't pertain to an OEP.

sarina commented 4 months ago

Closing per https://github.com/openedx/open-edx-proposals/issues/227#issuecomment-2118276888