todogroup / gh-issues

A curated set of issues related to GitHub and running corporate scale open source
http://todogroup.org
24 stars 4 forks source link

Improvements to License data in the API #87

Open geekygirldawn opened 1 month ago

geekygirldawn commented 1 month ago

Ideally, I would love to be able to easily get data out of the GitHub API that shows when a repository has changed their license, especially when it has changed from an open source license to a non-open source one or a more restrictive license.

As @gyehuda mentioned:

It would be wonderful for GitHub to document when a repo that was licensed as for at least a week gets relicensed under where either are OSI-approved.

Details in this discussion: https://github.com/todogroup/ospology/discussions/480

Or maybe it would be cool for GitHub to surface this another way - maybe something like https://innovationgraph.github.com/? I know Innovation Graph itself is focused on metrics broken out across various economic areas, so not exactly like that, but maybe there are some other things that companies care about (e.g., licenses, dependents info, supply chain security metrics) that could be grouped together in a way that let's people explore / analyze that data more easily?


On a related note, the way that the GraphQL API handles data from the licenseInfo object seems counter intuitive to me, partly because it returns very different things from what the API returns for other auto discovered file objects, like codeOfConduct.

Here's an example query:

query license{
  repository(owner: "chaoss", name: "wg-metrics-development"){
    licenseInfo{
      name
      url
    }
    codeOfConduct{
      url
      name
      resourcePath
    }
  }
}

And the output of the query:

{
  "data": {
    "repository": {
      "licenseInfo": {
        "name": "MIT License",
        "url": "http://choosealicense.com/licenses/mit/"
      },
      "codeOfConduct": {
        "url": "https://github.com/chaoss/.github/blob/main/CODE_OF_CONDUCT.md",
        "name": "Other",
        "resourcePath": "/chaoss/.github/blob/main/CODE_OF_CONDUCT.md"
      }
    }
  }
}

From the licenseInfo object, I can't seem to get to the actual name, url, or path of the file where the license is stored in the repository. This is unlike the codeOfConduct object, which returns the url / resourcePath, which lets me programmatically determine where I can find the file within the repository.

If I could derive the location / name of the license file in the repo via licenseInfo (or some other method), I could use it as the input into another query to get details about the commits for the file. In the below example, I hardcoded the name of the license file after manually looking it up in the repo, but ideally, I could get this from the GitHub API and pass it in as a variable into a query that would give me commit details.

query licenseCommits{
  repository(owner: "chaoss", name: "wg-metrics-development"){
    defaultBranchRef {
      name
      target {
        ... on Commit {
          history(path: "LICENSE", first: 100) {
            nodes {
              committedDate
              url
              additions
              deletions
            }
          }
        }
      }
    }
  }
}
{
  "data": {
    "repository": {
      "defaultBranchRef": {
        "name": "main",
        "target": {
          "history": {
            "nodes": [
              {
                "committedDate": "2022-05-09T17:45:16Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/6c4dbe9822f430ed3c809a49adcc32d619e34a31",
                "additions": 1,
                "deletions": 1
              },
              {
                "committedDate": "2021-03-29T19:28:13Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/f872d9caf03f55bf0e884fe1d9bbfa589b54c0b5",
                "additions": 1,
                "deletions": 1
              },
              {
                "committedDate": "2019-04-18T15:31:44Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/d5159c33c600b041ca8530ae96c14d6b87247787",
                "additions": 21,
                "deletions": 0
              }
            ]
          }
        }
      }
    }
  }
}

Maybe there is another way to do this that I just haven't found?

cc: @ahpook

ahpook commented 1 month ago

Hey @geekygirldawn thanks for filing this! You're right that the licenseInfo is inconsistent with the other "community standards" type of docs. We did a little digging internally and there is already a (private) API method that we could use to expose the filename, exactly like the resourcePath field does on codeOfConduct that you noted. Would adding that to the API be sufficient to get you going on this?

FWIW I don't expect the special case of tracking license content changes over time as a first-class API endpoint to happen; it seems like quite a niche that would have a high engineering cost. We don't in general do time-series/historical changes due to storage constraints, and, as you're proposing, with the file info it could be derived from the commit history.

ahpook commented 1 month ago

Oh, and: regarding The url field, I too find it a bit strange that it returns a link to choosealicense.com rather than the github.com URL to the file, but changing that would be considered a breaking API change 😢

geekygirldawn commented 1 month ago

We did a little digging internally and there is already a (private) API method that we could use to expose the filename, exactly like the resourcePath field does on codeOfConduct that you noted. Would adding that to the API be sufficient to get you going on this?

That would be super helpful, thank you!

FWIW I don't expect the special case of tracking license content changes over time as a first-class API endpoint to happen; it seems like quite a niche that would have a high engineering cost. We don't in general do time-series/historical changes due to storage constraints, and, as you're proposing, with the file info it could be derived from the commit history.

I didn't think so, but I thought it wouldn't hurt to ask :)