vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.57k stars 2.09k forks source link

Feature Request: Add a CI check to validate copyright/license on new files #9629

Open doeg opened 2 years ago

doeg commented 2 years ago

Feature Description

Many, many times I have either forgotten to add the copyright/license text when adding a new file, or I get the year wrong. (Sometimes I manage to do both!)

It would be nice to have a CI check that validates the presence of the license text with the correct copyright year.

This check should support either an allow list or an ignore list so that we can skip files that don't require a license; config files, documentation, and so on.

I'd propose the check works as follows:

Use Case(s)

Noted above. :)

mattlord commented 2 years ago

We might be able to use this existing Action from the marketplace in the CI: https://github.com/marketplace/actions/reuse-compliance-check

(Reuse is from FSF (Europe).)

We could also use reuse directly in a pre-commit hook: https://reuse.readthedocs.io/en/stable/readme.html#run-as-pre-commit-hook

deepthi commented 2 years ago

Warn if an updated file has an invalid copyright year

What is an invalid copyright year? Note that the copyright year should NOT be updated every time a file changes.

systay commented 2 years ago

I don't like that we have to have a license header in every file, and reading up on it, it's pretty clear that this is not needed with Apache 2. From https://infra.apache.org/apply-license.html#copy-per-file

Do I have to have a copy of the license in each source file?
You only need to add one full copy of the license per distribution. See the policy.

can we pretty please remove the licenses and just stick to the LICENSE file in the root?

mattlord commented 2 years ago

@systay there's a disconnect here, at least in my mind. :-) You keep sharing that link, but it says "do I need a [full] copy of the license in each source file". That's not what we have, we have a license header that points to the license in each file. And in their FAQs they do seem to say that this is required (at least recommended).

The full copy of the license can be seen here: http://www.apache.org/licenses/LICENSE-2.0

We only link to that in our files. I don't think anybody disagrees with the assertion that we don't need the full license text in each file — we do not today AFAICT.

In the same doc that you link to:

Each original source document (code and documentation, but not the LICENSE and NOTICE files) should include a short license header at the top. If the distribution contains documents not covered by an ICLA, CCLA or Software Grant (such as third-party libraries), consult the policy guide.

That seems quite explicit no?

Each original source document (code and documentation, but not the LICENSE and NOTICE files) should include a short license header at the top.

Again, they're telling us that here: https://www.apache.org/licenses/LICENSE-2.0#apply

You can see this being done in the Apache web server, e.g.: https://svn.apache.org/viewvc/httpd/httpd/trunk/include/http_config.h?view=markup

You can see this being done in k8s, e.g.: https://github.com/kubernetes/kubernetes/blob/f61ed439882e34d9dad28b602afdc852feb2337a/hack/verify-mocks.sh

That's exactly what we do today, no? But that's what you are proposing to remove?

The basic issue is that you license each file. Your project can use N licenses, applying different ones to different files (this often at least happens via dependencies and the licenses need to be compatible e.g. BSD and Apache) you can see this for our project today here.

What am I missing here?

P.S. CNCF recommends the same thing, a license header and copyright notice in each file: https://github.com/vitessio/vitess/pull/9684#issuecomment-1041017293