spdx / spdx-3-model

The model for the information captured in SPDX version 3 standard.
https://spdx.dev/use/specifications/
Other
69 stars 44 forks source link

Implement a way to associate copyright holders to licenses #239

Open tsteenbe opened 1 year ago

tsteenbe commented 1 year ago

One of the major issues that comes up a lot as a shortcoming of SPDX 2.x is that it's not possible associate copyright holders to licenses. A simple fix would be to change copyrightText (SPDX 2.x section 7.1.7, 8.8, 9.8) from the current free form text field to a map <license id, copyrights>

tsteenbe commented 1 year ago

@zvr @kestewart We emailed about this last November but did not see an issue so thought to raise an issue if there is an issue please let me know.

swinslow commented 1 year ago

cc @jlovejoy for visibility

This was extensively discussed in 2020-2021 as part of the original discussions for the licensing profile for 3.0.

In my view, the current structure is not a shortcoming. Ownership of copyright is not associated with licenses. Copyright holders are appropriately associated with the content (Package, File, Snippet) for which they are the copyright holder.

If someone wants to associate a particular copyright holder with a particular license for a sub-part of a File, I assume they could do so by modeling a Snippet for the sub-part of the File, and associating the license and copyright text with that Snippet.

I am opposed to changing copyrightText from a string to a map<license id, copyrights>. I believe it's an incorrect model and it increases complexity for everyday use cases with no meaningful benefit. This has not been needed for SPDX licensing use cases up to this point, and I disagree that it is a bug or a shortcoming.

maxhbr commented 1 year ago

This has not been needed for SPDX licensing use cases up to this point, and I disagree that it is a bug or a shortcoming.

I think that it is insufficient to represent the level of information that comes from tooling and is afterwards used for generating notices, shows that it is a shortcoming. It limits its use as exchange format and pushes tool implementers towards their own format.

But on the other hand, this new proposed format might be hard to produce for many tools. I think we need to find a flexible approach that can represent both levels of data.

Further, I think that map<license id, copyrights> might be insufficient, as the two MITs in MIT AND (BSD-3-Clause or MIT) might have different copyrights attached to them, despite having the same license id. But not sure if this is a real world problem.

tsteenbe commented 1 year ago

+1 on @maxhbr comments on representing the information that tooling collects. I am open for suggestions to a better approach.

From my years of looking at SPDX files in the supply chain of various sectors I mostly see SPDX files with copyright and licensing data at Package level not File or Snippet. Organizations want to correctly show who owns which code and under which license its used without having to share file level details - take for example SPDX lite.

SPDX users simply want to exchange the copyright + license combo they get from their tools and communicate it in as compact format (package level) to their SBOM receiving parties (usually their customers) so that these SBOM consumers can do further processing on this data.

@swinslow How can we get this discussed again in Licensing Profile WG meeting? Does it help to file a https://github.com/spdx/change-proposal/issues/?

swinslow commented 1 year ago

edited to add "at the package level" in first main paragraph

cc @jlovejoy for visibility

@tsteenbe If you want to submit a change proposal to discuss this again, please go ahead. I would encourage you to indicate specifically which open source licenses require that copyright notices not only be reproduced, but be associated directly with the particular corresponding license text, at the package level. I am not aware of any open source license that requires this, but please help clarify if there is one.

For clarity on timing, given that we are less than a week from the target date for a release candidate for 3.0, there is zero chance of this being added for a release candidate. I would not be in favor of it in any case, for the reasons indicated above and as previously discussed in 2020-2021.

From @maxhbr's comment above:

Further, I think that map<license id, copyrights> might be insufficient, as the two MITs in MIT AND (BSD-3-Clause or MIT) might have different copyrights attached to them, despite having the same license id. But not sure if this is a real world problem.

This is exactly correct. I am certain there are packages with an overall license equivalent to something like MIT AND (BSD-3-Clause OR MIT) or any similar combinations. This would include, e.g., something like an MIT-licensed package that contains a file from a third party which is itself licensed under the BSD/MIT choice.

If there's a desire to associate copyrights with the specific content, the right way to do it would be to model the software at the File or Snippet level, and then associate the copyrights with the corresponding File or Snippet. Trying to import File- or Snippet-level concepts into a Package increases complexity for everyone with no meaningful gain.

jlovejoy commented 1 year ago

+1 to everything that @swinslow has already said.

Re: submitting a change proposal, this would require a joint discussion with legal and tech teams given the subject matter and scope, especially in light of it relying on a legal interpretation of licenses which SPDX generally tries to avoid. In addition and as I already stated during the discussion on this a couple years ago and as @swinslow stated above - I don't see this as a license compliance requirement. I'd also note that re-discussing the same thing we discussed and decided a couple years ago is not optimal, given the limited time we all have to work on this project!

that being said, this comment from @tsteenbe is interesting:

I mostly see SPDX files with copyright and licensing data at Package level not File or Snippet. Organizations want to correctly show who owns which code and under which license its used without having to share file level details - take for example SPDX lite.

I think the second sentence is really telling, if I read that right: orgs do not want to share a level of detail (e.g., file or snippet information) that SPDX supports and that would be the way the SPDX spec provides to specifically connecting a specific copyright holder with a specific set of code and its license, but they still want the latter. That seems a bit incongruent and perhaps shows a misunderstanding of what is needed (for license compliance)? I can't see what need this combination of information fulfills.

As a side/related note (while I'm typing): over the many years of being involved as a lawyer with open source licensing, I've seen a lot of examples of making open source license compliance harder by trying to make it easier (if that makes sense... I need to write an article about this some day).

tsteenbe commented 1 year ago

@maxhbr Instead of license id we could associate license expression to copyrights

@swinslow I don't look at what license require that for the lawyers - I am simply trying to implement in SPDX what is common capability of compliance tools show copyright and license applicable to a package and generate a FOSS attribution file containing a , , .

It's currently not possible to use SPDX to say take the license findings of scanners such as ScanCode and pass them to another tool for further processing. Have to use bespoke/proprietary formats for this type of tool-to-tool exchange and then day @kestewart asking why we don't use SPDX.

goneall commented 1 year ago

Kate and I discussed this and it probably won't get resolved for 3.0 - moving to 3.1.