todogroup / gh-issues

A curated set of issues related to GitHub and running corporate scale open source
http://todogroup.org
25 stars 4 forks source link

Support multiple licenses on a repo #72

Closed ashleywolf closed 2 years ago

ashleywolf commented 2 years ago

GitHub UI to support multiple licenses on a repo. Please share your use cases or feedback here. Especially if there's any current pain points you experience.

caniszczyk commented 2 years ago

I would love to be able to specify a "Documentation license" and a "Code license" as sometimes these are different things, which is common in LF/CNCF projects where we default to Apache 2.0 for code and CC BY 4.0 for docs

We sometimes have repos where there is a "LICENSE" file for code and "LICENSE-DOCS" for documentation, it would be nice if GitHub somehow supported this natively via UI/API

On Mon, Jul 26, 2021 at 4:08 PM Ashley Wolf @.***> wrote:

GitHub UI to support multiple licenses on a repo. Please share your use cases or feedback here. Especially if there's any current pain points you experience.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/todogroup/gh-issues/issues/72, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIKKHPC5VGQWTY4XG6DTZXFD5ANCNFSM5BA4NUZQ .

-- Cheers,

Chris Aniszczyk https://aniszczyk.org

hyandell commented 2 years ago

Apologies if some of if this off the mark for how it works today.

Today my assumption is that licensee is spotting a file in the default branch called LICENSE, COPYING, or very similar and attempting to id. It then states this as the project license across the project-repo site.

Pains I've seen:

+1 for Chris' point above. At AWS we often have two licenses on our documentation repos.

Feature requests:

tsteenbe commented 2 years ago

+1 on Chris and Henri's comments above, I wish GitHub would support SPDX license expression and possibly support for a SPDX file in report or API for uploading SPDX files for Marketplace tools. This would allow maintainers to use a tool like ORT to generate software bill of material to exactly specify which licenses apply. Based on the SPDX file GitHub could then create a "one-liner SPDX expression for code only" (as most devs care only about the code) and upon a click show an "in-depth licensing" UI with the licensing for build tools, docs, examples and dependencies.

OSS licensing is a complex topic, especially as build tools are made to build code and not for license compliance. From my experience maintaining license information is not on the top of mind of most developers and there are lot of misunderstandings. As an SPDX and ORT core contributor I working towards making the specification and tooling to generate as easy as possible but still be able to generate a high quality SPDX file.

For Henri's case of knowing the license for a past code revision why not provide an option to on-demand recompute licensing? Know it not the best UX but people who care about licensing are probably more than willing to wait a minute.

juliaferraioli commented 2 years ago

+1 to everything said here already, especially what @tsteenbe has said. In addition to being able to transparently surface multiple licenses and/or licenses that apply to different parts of the project, being able to surface licenses for transitive dependencies of a project is important for people looking to make technology decisions.

A license tree that reflects all the licenses needed to make a project work would be nice to see, even if the code for the dependencies is not located in the project's repository (which hopefully it is not).

byjrack commented 2 years ago

Agree with all the above around enhancing the dep graph view for licenses

gyehuda commented 2 years ago

I'll point to github.com/tauri-apps/tauri as a good example of a project that looks like they are being as clear as they can about licenses, but the license detection on GitHub (as of today) is not really tuned to pick up on the subtlety.

silverhook commented 2 years ago

May I suggest taking a look and hopefully just implementing REUSE.software? (also CC @mxmehl)

A REUSE compliant repo has:

That makes the repo both human and machine readable to easily parse which licenses apply to the repo (ls LICENSES/) as well as to which specific files in the repo (grep -r SPDX-License-Identifier *).

mxmehl commented 2 years ago

May I suggest taking a look and hopefully just implementing REUSE.software?

Full agreement. The LICENSES/ directory is an elegant way to store license texts. It's not only REUSE that stipulates that, it's also used by the Linux Kernel, the Core Infrastructure Initiative (coreinfrastructure/best-practices-badge#1547) or the KDE community – and 650+ projects registered to the REUSE API.

byjrack commented 2 years ago

So I don't think GitHub changes the git repo for any feature in the product, but could be wrong here. Even things like the .github code of conduct is just surfaced in Issue/PR UI and not as a reference in the repo content view.

So is the suggestion that GitHub nudge the community by making the multi license UI and repo metadata be restricted to those who have adopted REUSE schema? I would guess that GitHub have increased consistency in the LICENSE.md by the repo create flow and how licensee surfaces it in the UI/metadata which is a net benefit for the community I think.

For the scope of a work I think that could apply, but for the dependency tree including those terms as references in the repo probably wouldn't be the best option?

pombredanne commented 2 years ago

Using scancode-toolkit for license detection would nicely detect and report all the multiple licenses as SPDX license expressions. It is used at scale mostly everywhere including in ClearlyDefined (which GitHub reuses too?)

anthonyronda commented 2 years ago

I'd like to prevent uncertainty introduced by a new input field for an SPDX expression string, or by scanning package metadata. We've all encountered packages on npm or PyPI that declare a license string but don't include the license text, or the SPDX-ID and LICENSE don't match. If it were to ever accept an SPDX expression with a new field or metadata, it would hopefully fail to validate if all matching license texts aren't discoverable.

I think @silverhook and @pombredanne's suggestions of leveraging existing SBOM specs and scancode-toolkit are least susceptible to this uncertainty, and the various GitHub product teams won't have to negotiate over adding a new field.

CsatariGergely commented 2 years ago

I think the support for several type of licenses in a repo is very mutch needed. Lots of projects have a separate documentation license or have different licenses in different modules. REUSE is a good way collect the licenses and to assign the licenses to paths while SPDX id-s are a good method to identify the licenses. I think providing SPDX documents with the licenses of the dependencies is a bit overkill, especially when the dependencies are not in the given repo. On the other side this feature would be needed in GitHub Packages.

gkunz commented 2 years ago

In general, yes, I'd very much like to see support for multiple licenses in the Github UI. I like the approach of REUSE to handle (multiple) licenses, so this is a good basis.

ashleywolf commented 2 years ago

Support for displaying multiple licenses has been released. See https://github.blog/changelog/2022-05-26-easily-discover-and-navigate-to-multiple-licenses-in-repositories/ for more info.

silverhook commented 2 years ago

It’s great to see GitHub is finally taking into consideration that many repos carry more than one license.

I find it very disappointing though that instead of leaning on an existing community best practices it continues pushing for amassing LICENSE.* files and relying on a very opinionated tool.