rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.66k stars 2.41k forks source link

Add metadata to mark all license and copyright files to be shipped when redistributing packages #12053

Open JanBeh opened 1 year ago

JanBeh commented 1 year ago

Problem

It's idiomatic to split up functionality into many (small) crates. I can easily have hundreds or thousands of dependencies (or even more?) when using Rust. This makes Rust unsuitable to create and ship binaries or packages which include binaries, because even the most liberal licenses require to include copies of the license text (which is often derived from a template) and/or additional files (e.g. a NOTICE file in case of the Apache 2.0 license).

It's practically very very hard for a single maintainer to do this, especially when a lot of dependencies are involved and when wanting to provide regular updates. There are some automated tools, such as

but these seem insufficient, as I explained in this post on URLO.

Moreover, it doesn't feel right to do this using a heuristic (which can fail). Corresponding metadata is missing as of today's Cargo.toml specification.

Proposed Solution

A new metadata field should be added, which is giving an exhaustive list of files to be shipped/bundled when redistributing a package (or part of a package and/or when making a derived work, e.g. when compiling a binary).

Of course this list could be wrong, but then such a crate could be marked as errorneous in a database (just like a vulnerability, because it does sort-of mislead you in a dangerous way). Ideally such a field would become mandatory at some point in future. See also my follow-up on URLO.

Notes

Related issue: #8537, see also comment. Note, however, that license-file isn't a good choice for the proposed feature. That field is meant for custom licenses, and it also wouldn't be suitable to refer to a NOTICE file (or any other file which is required for proper attribution) to be shipped in addition to a license file.

Could the authors field be used, or could something like a copyright-lines field be introduced to be able to recreate the licenses from scratch? Likely not. Many license texts are templates where the copyright holder or product name(s) are incorporated directly into the text of the license. See my comment here on URLO on that matter. For example, the MIT-Modern-Variant license template has "THE UNIVERSITY OF CALIFORNIA" as part of the template text, which is ought to be replaced if the license is used by someone other than the University of California. Thus each license can be unique and possibly must be bundled as-is (i.e. word-by-word) in order to fulfill the license requirements.

weihanglo commented 1 year ago

Thanks for the proposal!

I am not an expert in this topic. Just got some questions.

Perhaps the precise version of the question is: For a downstream developer, how to figure out what to bundle even when an upstream library doesn't provide sufficient info of what must be bundled.

JanBeh commented 1 year ago
  • How a new field of exhaustive list could help the situation ecosystem-wide?

It could aid the situation by encouraging crate authors to properly specify an exhaustive list of all files that need to be shipped (due to license requirements) when distributing a derived work.

In the future, specifying such a list could become mandatory. Currently the Cargo docs say here:

Before publishing, make sure you have filled out the following fields:

Moreover in the documentation of the license field::

Note: crates.io requires either license or license-file to be set.

So authors are required to specify a license (ideally in a way that can be machine-interpreted). But currently they are not required to specify the location of the license file and copyright notes in a machine-readable way. Demanding the latter would help solving the problem outlined in the OP.

  • Assumed people do not actively fill in the field, […]

Currently, people also fill-in the license field. I believe it's possible to establish good practices which involve specifying the locations to the corresponding files as well.

  • There is a mention of a database doing checks for incorrect lists, but how could a database infer what files should be there with the absence of that list?

I don't propose inferring the list automatically. My proposal is to encourage crate authors to provide this information.

Note that it's still possible to provide incorrect or incomplete information. This is also possible as of today. For example, see memalloc-0.1.0 (source). That crate specifies license = "MIT" but doesn't ship a license file. How am I supposed to include "the above copyright notice and [the] permission notice" as demanded by the MIT license if the license including the ("above") copyright notice is missing in the crate?

I don't think there is an automatic way of ensuring that redistributed-include (or maybe derived-include would be a better name?) is specified correctly. But in the same way, license can't be verified to be correctly set, i.e. a crate author might just include a different license in the package (or fail to include relevant files for the license to be usable).

However, it's possible to use heuristics to search for packages where redistributed-include might be set wrongly. It's also possible to check and document reports of people who stumbled upon packages with improper license/copyright information. That could be done in a similar way as vulnerabilities are being reported and made public.

Perhaps the precise version of the question is: For a downstream developer, how to figure out what to bundle even when an upstream library doesn't provide sufficient info of what must be bundled.

My proposal is: We should try to avoid the situation that an upstream library doesn't provide sufficient info/data in the first place.

epage commented 1 year ago

Personally, when it comes to license compliance like this, there are a lot of complications and nuance that I think this deserves an RFC, starting with a Pre-RFC on Internals. In preparing the Pre-RFC and RFC, I think it would be important to work with the authors of the aforementioned tools on it and see if you can get a spread of people who deal with software legal compliance. For example, I know of two people at prior points in their career who were the liaisons between R&D and legal for legal compliance (a lot of my caution in this area comes from speaking with one of these). It'd be good to find multiple people like that across the community to get a breadth of experience and perspectives. I wonder if we can get the Foundation to help consult lawyers as well.

In driving this, I would recommend stepping back a bit and re-evaluate how you are approaching other people to avoid derailing this effort. While I've not caught up with everything, the parts of your posts I've skimmed come across with a harsh tone that might make this kind of collaboration more difficult.

bk2204 commented 1 year ago

As I've pointed out elsewhere, almost all licenses require that the license text be included with the software. Assuming we're not doing something like Debian's common-licenses directory, that means that every time someone specifies a license of MIT or Apache-2.0 that some license text must be included. My proposal was simply not to complain about the joint use of the license and license-file keywords, since encouraging people not to use both license and license-file actually encourages people to not comply with the license. (The license keyword is useful machine-readable contexts, and license-file is useful for including the text and copyright information.) Those issues have unfortunately been closed, however.

This poses a practical problem for me as a distributor of Rust-based binaries in my corporate role because I have to personally extract this information out of the Git repository when it's not included in the crate, which is tedious with many such crates. I know that when distributors such as Debian ship a crate or other Rust-based software, they must also include this information, so I'm hardly the only person who would benefit from a change.

Providing some sort of metadata where users could specify the license itself, the copyright information, and any other legally required text would be helpful, I think, and encourage license compliance.

JanBeh commented 1 year ago

@epage

Personally, when it comes to license compliance like this, there are a lot of complications and nuance that I think this deserves an RFC, starting with a Pre-RFC on Internals.

I agree this is a rather deep issue which deserves thorough consideration instead of taking quick steps. I wanted to open this issue to highlight/track a problem and to propose a potential solution. It doesn't need to be solved quickly (but should be solved eventually, in my opinion), and I'm sorry if opening this issue was the wrong procedure for the development process. I have seen several other issues open, which weren't (in my opinion) addressing the core of the problem properly; hence this issue.

While I've not caught up with everything, the parts of your posts I've skimmed come across with a harsh tone that might make this kind of collaboration more difficult.

As you don't refer to any specific post, I'm not sure what you're talking about. If there's any communication issue, feel free to send me a direct message. Thank you.

JanBeh commented 1 year ago

I'd like to note that I currently don't have the time to formalize this proposal in terms of writing up an RFC, contacting the authors of the aforementioned crates, or speak to lawyers of the Foundation (I also doubt they'd be available for me, as a contributor). So if it's really necessary to make this a formal process, I'd kindly like someone else to push this issue forward. I do think that there are more people who need this feature or a solution that solves the issue in a similar or better way.

epage commented 1 year ago

I do think that there are more people who need this feature or a solution that solves the issue in a similar or better way.

As a reminder, Rust development is done by volunteers. If someone doesn't step up to lead an effort like this, then it doesn't get done.

JanBeh commented 1 year ago

After the previous posts, I don't expect this to move forward. I wrote that notice so other people know I won't do the proposed actions regarding involving lawyers and/or doing a community-wide research to get "a breadth of experience and perspectives." Please note that I'm a volunteer too, and I'm not well connected to the Rust developer scene and/or the Foundation.

I merely pointed out some legal issues in my posts and in this issue and wrote up a feature proposal. It's nothing more, nothing less. Feel free to do the proposed actions if you think they are good to do.

It would be nice if my efforts here or elsewhere (as much or as little they may be) are being appreciated. Meta: I don't think writing a feature request or issue report is a bad thing per-se, even if you can't write a corresponding pull-request and/or start further processes needed to fix an issue.