spdx / spdx-spec

The SPDX specification in MarkDown and HTML formats.
https://spdx.github.io/spdx-spec/
Other
283 stars 135 forks source link

Add profile for multiple SPDX files with short licensing/copyright info #502

Open mxmehl opened 3 years ago

mxmehl commented 3 years ago

As suggested by @goneall, I would like to propose a new SPDX profile for the 3.0 spec. At REUSE we're looking for a more flexible and human-editable solution to deprecate our DEP5 spec to bulk-license files, currently under the working title "REUSE.yaml". However, we would love to be compatible with SPDX in this matter.

Goals

Rationale

Obviously, a full SPDX file is not readable and maintainable for average developers. There should be a way to specifiy only copyright and licensing of one or multiple files in a very concise manner. To not have to learn a differing syntax, the tags names SPDX-FileCopyrightText and SPDX-License-Identifier should stay the same. Ideally, all information would be applied relative to the SPDX file path (but not being able to define files in paths above its own location).

Scenario: a maintainer marked all source codes files according to the REUSE best practices with in-file comment headers. For a directory with 500 icon files however, they would prefer to bulk-declare these. To do this, it would be the easiest to create a YAML/JSON file inside of this repo, use * as target, and add copyright and licensing information.

Ideas for implementation

We've collected some syntax proposals for REUSE.yaml in this thread. For example:

- src/*:
    SPDX-FileCopyrightText:
      - 2020 Me
      - © 2017 You
    SPDX-License-Identifier: MIT

or

- src/*:
    license:
      SPDX-License-Identifier: MIT
    copyright: |
      SPDX-FileCopyrightText: 2020 Me
      SPDX-FileCopyrightText: © 2017 You

We are open to the exact syntax, but it would be wise to not make it much more complex.

However, I am aware that some of these proposals stretch the general idea of SPDX files and perhaps also the new profiles. I am excited to learn what you folks are thinking about this very simple approach.

swinslow commented 3 years ago

Hi @mxmehl, thanks for this! Wanted to add my quick initial thoughts:

I really like the idea of something like this, as a lightweight format for developers to use to express this sort of data.

I'm hesitant to call this sort of file, itself, an "SPDX document". I think that whether we're talking the current SPDX 2.2 spec, or the current thinking for 3.0, there are aspects of an SPDX document that are important to retain for its intended use cases which are absent here. I think the intended concept for "profiles" for 3.0 is that they would be the "base" set of fields for SPDX core elements, plus other fields. I don't think minus planned mandatory fields would align with the approaches that have been discussed.

What I'd propose would be, instead, to treat something like this as -- I don't really have a name for it, but something like a "pre-SPDX manifest". In other words, here is a manifest file format that can easily be consumed (by a script, CI/CD system, a GitHub action, etc.) to output an actual, full-fledged SPDX document.

That script / action can do all the hard work of collecting hashes, outputting the data in true SPDX format, etc., and can use this manifest file as an input to fill in the appropriate corresponding fields. The SPDX document that it generates can then be incorporated into the build artifacts that are published every time a version is released.

That way the project developers don't themselves need to manage the SPDX document details, but there is a standardized way to consume this "pre-SPDX" data to generate a true SPDX document.

Does that make sense? Not sure if I'm explaining well but that's my off-the-cuff reaction here.

goneall commented 3 years ago

@swinslow Since I suggested to @mxmehl a profile, I thought I would add an opinion. We have done a "minus" type profile with the SPDX-Lite proposal from the Asia group, so we do have some precedence. That being said, you suggestion of using a different name to describe the subset makes sense to me.

The only thing I have a strong opinion on is having the light weight manifest information be part of the SPDX specification and consistent with the terms. This will make it easy for the consumers to easily integrate this into their tooling systems (as you stated above) and will also give confidence to the producers that it will fit into the larger ecosystem.

swinslow commented 3 years ago

Thanks @goneall! Sounds great. Wholeheartedly agree with having the lightweight manifest defined as part of the SPDX spec and consistent with its terms.

mxmehl commented 3 years ago

Thank you for sharing your thoughts! I cannot comment so much on how to call this, and how this integrates in your current schemes.

What I like with the "pre-SPDX" data is that it may allow more freedom regarding location (flexible/relative) and field names (SPDX-License-Identifier/SPDX-FileCopyrightText) than a boiled-down SPDX document that may have more formal requirements. On the other side, if the latter's formal requirements are more flexible, I'd be also fine with that.

In general, I wholeheartly agree that having this as part of the specification makes totally sense for both SPDX and REUSE, and I hope we'll find a good solution.

zvr commented 3 years ago

An enthusiastic +1 on this proposal.

A few thoughts:

  1. I would strongly suggest to only have existent valid SPDX tags and not introduce any other information. Therefore, in the example presented above I'd object to introducting groupings like license and copyright.
  2. I would also assume that, although @mxmehl's proposal only mentions Copyright and License info, we would be fine to have any File-level attriute represented in such a form, like Contributor or potentially security information.
  3. Finally, I think the right way to integrate this to the spec would be in a manner similar to the section "How to use SPDX inside Files" (and not a profile). Something like "How to provide bulk information on Files". This info could then be processed (by reading the actual filenames and expand the *, for example), to produce an SPDX document.

@mxmehl, any interest in trying to integrate even more REUSE stuff in SPDX spec? I can totally see a simliar section about how the ExtractedText of licenses can be stored in a directory LICENSES with filenames as the license identifiers, for example.

Thinking more about it, all this falls under the umbrella "add SPDX information" (to a repo/directory/package) instead of "produce an SPDX document". This is an additional direction I'm happy to have in the spec.

mxmehl commented 3 years ago

@zvr Thanks for your feedback. I cannot say much more about how to integrate it into SPDX but I am happy to give feedback to concrete proposals from a (REUSE) user perspective. Again, see me as the advocate of average Jane/Joe developer who is confused by all this legal stuff ;)

@mxmehl, any interest in trying to integrate even more REUSE stuff in SPDX spec?

Sure, why not? The LICENSES directory is an invention of REUSE and is slowly being picked up by other initiatives, e.g. in coreinfrastructure/best-practices-badge#1547

goneall commented 3 years ago

From the SPDX tech call on 27 April 2021:

mxmehl commented 3 years ago

Excellent summary, thank you! I just would like to emphasise that this "metadata, pre-document file" is limited to its own directory and its subdirectories. So this file cannot bulk-define attributes of parent directories.

Regarding the precedence discussion, please see issue fsfe/reuse-docs#70 in which option 3 is currently the favourite. It should not concern SPDX directly but may give some context and assurance that REUSE will take care of resolving conflicts.

zvr commented 3 years ago

So, where shall the discussion on the actual specification of this file take place? There is need to:

mxmehl commented 3 years ago

There are two issues where we can discuss the points raised by @zvr:

I'd love to get feedback on both points so we can prepare a well-founded suggestion.

kestewart commented 2 years ago

For consideration earlier in 2.3 vs. 3.0 - to be discussed.

kestewart commented 2 years ago

Per discussion in the call, leaning towards leaving it in 3.0 as it's a profile, but need to sync up with @mxmehl

mxmehl commented 2 years ago

Per discussion in the call, leaning towards leaving it in 3.0 as it's a profile, but need to sync up with @mxmehl

Obviously we'd be happy to have it in 2.3 as it's a feature requested a lot but happy to discuss that with you.

goneall commented 2 years ago

@zvr @mxmehl @kestewart I'm thinking if we get a PR within the next week, we can review and potentially include it in the 2.3 release. Let me know if you agree.

mxmehl commented 2 years ago

Great! How shall we proceed? I am afraid I lack the required detailed inside knowledge of the SPDX spec to draft a pull request that won't raise too many side problems with exact wording and placement. However, I'd be very happy to provide input and feedback early on. One could take fsfe/reuse-docs#81 as inspiration.

One thought that crossed my mind is whether we want to have SPDX-License-Identifier and SPDX-FileCopyrightText in such a file as tools might interpret these strings as license/copyright of this file. On the other hand we don't want to confuse people by a different syntax.

goneall commented 2 years ago

@mxmehl Looking back through the thread, it looks like adding a separate "Annex" (previously called Appendix) would be the approach for adding to the spec. Format could be similar to the SPDX Lite Annex

I won't have much bandwidth to help drafting since I'm pretty booked with other SPDX 2.3 activities, but I can help review.

Adding @swinslow @jlovejoy to the thread since the General Meeting notes above indicated legal team review was of interest.

The timeframe may be a bit tight to get this into 2.3 - @zvr @kestewart any thoughts?

mxmehl commented 2 years ago

OK, no promise but I can try to kickstart a pull request creating an annex to get the ball rolling.

goneall commented 5 months ago

Since we're a couple weeks away from 3.0, I'm moving this to a 3.1 milestone.

silverhook commented 1 month ago

@mxmehl , is this now made irrelevant with REUSE 3.2’s reuse.yaml?

mxmehl commented 1 month ago

@mxmehl , is this now made irrelevant with REUSE 3.2’s reuse.yaml?

Indeed. REUSE progressed on its own since the demand was so high.

It would, however, make sense if SPDX acknowledged this procedure in an Annex or so.