ioos / ioos_metrics

Working on creating metrics for the IOOS by the numbers
https://ioos.github.io/ioos_metrics/
MIT License
2 stars 4 forks source link

quantifying impact of github organization #59

Open MathewBiddle opened 7 months ago

MathewBiddle commented 7 months ago

Can we develop a metric to quantify the impact of the IOOS GitHub organization?

related to #26 but expanding further into our non-packaged repositories (e.g. documentation).

Number of forks, stars, active contributors, etc.

ocefpaf commented 7 months ago

Maybe something like this?

import os

import pandas as pd
from github import Github
from safer import open

try:
    with open(os.path.expanduser("~/.ghoauth"), "r") as f:
        access_token = f.read()
        access_token = str(access_token).strip()
except FileNotFoundError:
    access_token = None

g = Github(access_token)

user = g.get_user("ioos")
repos = user.get_repos()

ioos_gh = {}
for repo in user.get_repos():
    print(repo.name)
    if repo.fork is False:
        stars = repo.stargazers_count
        contributors = repo.get_contributors()
        contributors_contribution = {
            contributor.name: contributor.contributions
            for contributor in contributors
            }

        ioos_gh.update(
            {
                repo.name: {
                    "stars": stars,
                    "forks": repo.forks,
                    "contributors": contributors_contribution,
                },
            }
        )

df = pd.DataFrame(ioos_gh).T.sort_values(by="stars", ascending=False)

You will need a GH token to run it but it should not require elevated permissions, just read should do it. Here is what I got from the code above:

df.head(n=20)
                             stars forks                                       contributors
compliance-checker              96    51  [{'Benjamin Adams': 713}, {'Luke Campbell': 33...
erddapy                         75    29  [{'Filipe': 740}, {'Vini Salazar': 83}, {'Call...
bio_data_guide                  43    18  [{'Mathew Biddle': 222}, {'Tylar': 81}, {'Bret...
ioos_qc                         39    22  [{'Kyle Wilcox': 199}, {'Filipe': 71}, {'Luke ...
pyoos                           34    33  [{'Filipe': 52}, {'Dave Foster': 24}, {'Emilio...
conda-recipes                   20    29  [{'Filipe': 1186}, {'Rich Signell': 289}, {'IO...
notebooks_demos                 19    19  [{'Filipe': 774}, {'Jennifer Bosch Webster': 7...
gsoc                            16     9  [{'Mathew Biddle': 28}, {'Micah Wengren': 26},...
thredds_crawler                 16    22  [{'Kyle Wilcox': 63}, {'Luke Campbell': 15}, {...
Cloud-Sandbox                    9    11  [{'Patrick Tripp': 113}, {'Jonathan Joyce': 9}...
ioos-python-package-skeleton     9     9  [{'Filipe': 113}, {None: 3}, {'Alex Kerney': 2...
BioData-Training-Workshop        8     8  [{'Don Setiawan': 41}, {'Ben Best': 17}, {'Fil...
ioos_code_lab                    8     7  [{'Filipe': 1140}, {'Mathew Biddle': 96}, {'Je...
ioosngdac                        8    18  [{'John Kerfoot': 80}, {'Luke Campbell': 20}, ...
erddap-gold-standard             8    15  [{'Mathew Biddle': 16}, {'Kyle Wilcox': 6}, {'...
system-test                      7    14  [{'Bob Fratantonio': 69}, {'Filipe': 68}, {'Ri...
ckanext-ioos-theme               7    14  [{'Benjamin Adams': 202}, {'Luke Campbell': 10...
soundcoop                        6     2  [{'Clea Parcerisas': 15}, {None: 6}, {'Carlos ...
glider-dac                       6    12  [{'Benjamin Adams': 295}, {'Luke Campbell': 20...
service-monitor                  6    13  [{'Luke Campbell': 304}, {'Benjamin Adams': 16...
MathewBiddle commented 6 months ago

I like what you've done here @ocefpaf! Maybe quantifying the number of contributors too. But, that should be easy with the list you developed.

FYI, I just ran across this https://opensource.guide/metrics/

MathewBiddle commented 6 months ago

This is interesting too https://chaoss.community/software/

MathewBiddle commented 6 months ago

we can get a lot of stuff from github's advanced search:

https://github.com/search?q=org%3Aioos&type=repositories&ref=advsearch

ocefpaf commented 6 months ago

I like what you've done here @ocefpaf! Maybe quantifying the number of contributors too. But, that should be easy with the list you developed.

Yes. we can do something like:

contributors = []
for repo, row in df.iterrows():
    s = pd.Series(row["contributors"])
    s.name = repo
    contributors.append(s)

index = pd.concat(contributors, axis=1).sum(axis=1).sort_values(ascending=False).index
contributors_per_repo = pd.concat(contributors, axis=1).reindex(index)

contributors_per_repo.sum(axis=1)

FYI, I just ran across this https://opensource.guide/metrics/ This is interesting too https://chaoss.community/software/

Those are a really nice resources! I knew about CHAOSS but nor the opensource.guide.

we can get a lot of stuff from github's advanced search

If you are just browsing, yes. But we can get all that info grammatically with PyGitHub and create tables, etc. The repo object in the main loop has all the info and, if you are using an elevated token, you can even do fancy things like write/create, but we don't need that for the metrics.

MathewBiddle commented 5 months ago

also could be worthwhile to look at the number of participants in issues https://gist.github.com/ocefpaf/2ed11e4c977adfe3ffeb5eef9f576c1e

While they might not be directly contributing to a project, they are participating in the conversation.

ocefpaf commented 5 months ago

While they might not be directly contributing to a project, they are participating in the conversation.

That indeed made a few repos popup, like ioos-atn-data and bio_data_guide. See the last two cells in https://gist.github.com/ocefpaf/11a7c4832b23dc3978a1a3fb20783988