zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.78k stars 6.58k forks source link

Automate find missing SPDX tags in source files #57138

Open romkell opened 1 year ago

romkell commented 1 year ago

Introduction

Running the SPDX tool with west on a build target generating the SPDX report revealed, that not all source files were assigned a license. At a closer look, source files (from an external source, HAL) had a license statement in the file but no SPDX tag. The statement in the source is pointing to a license file in the modules root directory.

According to

https://github.com/zephyrproject-rtos/zephyr/issues/new?assignees=&labels=TSC&template=ext-source.md&title=

also external sources shall be SPDX tagged.

Problem description

We have customers demanding prove of only using permissive licensed code (no GPL) in order not to have to disclose the full sources, if we use open source code in our products that we sell them. The prove would be the SPDX report for a built firmware. If that report has major gaps due to missing SPDX tag in the source files, the SPDX report looses its value.

Proposed change

Since source files slipped in, the CI should run a check, that all sources carry the SPDX tag and that the license is a permissive kind.

Detailed RFC

Proposed change (Detailed)

See "Detailed RFC"

Dependencies

Concerns and Unresolved Questions

Alternatives

romkell commented 1 year ago

From the Discord discussion:

romkel — 04/20/2023 12:04 PM SPDX: we recently generated the SPDX information for a build binary to check that no none permissive licences are included (customer requirement) and found that for a high number of sources the licenes is not in the output file. This is mainly due to the SPDX tag missing in the source out of the git sub-modules (e.g. HAL_STM32). I had a look at diverse HALs and found that some use the SPDX license tag, some do not, and some do inconsistent.

The SPDX output file almost completely looses its value with such huge gaps and if presented to an external raises more question than it gives answers.

Is realistic to ask resp. make it a prerequisite to third-parties to have their repos tagged with SPDX before becoming a Zephyr OS module? I guess the answer is no.

Would it be feasible to add an SPDX.yaml or similar to the modules zephyr folder stating the modules default license which is used by the SPDX tool to mark files from that location as such in the SPDX output. Files out of the same root with SPDX tag would overule the default. SPDX.yaml could carry an exception list for files without SPDX tag but under a different licenes : and : or similar. The tool could and should do some checking if SPDX.yaml references non-existing files and paths.

It then would be the maintainers choice to maintain SPDX tag in the sources, use SPDX.yaml and put everything under the same license (simple) or maintain a complicated SPDX.yaml with 100-th exceptions (cumbersome). Do you consider that a feasible suggestion? Henrik Brix Andersen — 04/20/2023 12:28 PM Perhaps the zephyr/module.yml file could just feature an spdx section/key? romkel — 04/20/2023 1:40 PM If it becomes a Zephyr OS specific solution it could go into './zepyhr/module.yml'. If it is something that shall be SPDX standard it needs to be an separate file, I guess, in the root folder it affects. Then './zephyr', as I suggested is also not the right location anymore. @kstewart , @nicpappler being in the SPDX group, what do you think?

nashif commented 1 year ago

According to

zephyrproject-rtos/zephyr/issues/new?assignees=&labels=TSC&template=ext-source.md&title=

also external sources shall be SPDX tagged.

Where does it say that? The ask in the template is to provide the license name of the module in form of an SPDX identifier, it is not asking for the source files to be SPDX tagged.

romkell commented 1 year ago

Where does it say that? The ask in the template is to provide the license name of the module in form of an SPDX identifier, it is not asking for the source files to be SPDX tagged.

License

Please use an SPDX identifier (https://spdx.org/licenses/), such as BSD-3-Clause

It does not explicitly demand to tag each file, as it is written, it rather is a polite request to do so.

At least to

https://spdx.dev/ids/

the IDs were meant to be used with the SPDX-License-Identifier: field ID in the actual documents/files itself.

That is how I understand https://spdx.dev/ids/ and therefore the ask in the template.

Looking at the hal_stm32 (just the micro that we happen use in that customer project), there is a central license file naming 2 of those SPDX IDs somewhere in the free text. The sources referring to that central license file.

  * Copyright (c) 2022 <vendor>.
  * All rights reserved.
  *
  * This software is licensed under terms that can be found in the LICENSE file
  * in the root directory of this software component.
  * If no LICENSE file comes with this software, it is provided AS-IS.
  *
  ******************************************************************************
  */

This is just one example and I do not know if that is the general approach how it is done by this vendor also in his environment or if that was invented for the use as an zephyrproject-rtos module.

Nevertheless generating an SPDX for a built binary, created some 30 - 40% empty license information and therefore makes the report near to useless. In Discord before the RFC, I suggested to add a mechanism with a central license info file per module (see copied text above). I see the following solutions:

  1. do nothing - then the quality of the SPDX report high depends on what of the eco system you use
  2. have a non-standard add-on mechanism implemented in our version of SPDX tool - as suggested in Discord and copied in this RFC, feels a bit like adding a zephyr specific balcony to SPDX and the SPDX tool.
  3. Ask (demand) the module source be tagged properly - can we really demand that? Possibly for the module in zephyrproject-rtos, but rather not upstream) 3.1. provide a script as a supporting tool (basically find/replace), that can add a proper SPDX tag at a given location in the source files for the downstream sources in zephyrproject-rtos - no 100% automation, since rather error prone, rather only support to the module maintainer to easily distribute such tags, still manual checking required.

The RFC asks for a CI supported automated verification that for all source files a license is found by the SPDX tool, to identify gaps.

We possibly need to do one step back and first clarify how to handle SPDX tagging specially for zephyrproject-rtos modules in general.

nashif commented 1 year ago

That is how I understand spdx.dev/ids and therefore the ask in the template.

the ask in the template is to specify a license, and to be accurate, use the SPDX ID of the license instead of some free form, there is no requirement for modules to tag have every file tagged with SPDX.

romkell commented 1 year ago

the ask in the template is to specify a license, and to be accurate, use the SPDX ID of the license instead of some free form, there is no requirement for modules to tag have every file tagged with SPDX.

What ever the template is asking for, we end up, and that is the current status, with an SPDX tool resp. the output of it that is highly incomplete, dependent on which modules you use. The options I see are listed above. The question is do we want to leave it in that state or do we want to improve it, because you cannot show that your application does not use non-permissive licensed code. One can do all by hand, but then the whole SPDX tool does not make any sense.

nashif commented 1 year ago

@romkell The point I am trying to make is that modules were never required to have SPDX headers in licences. The license of the module in general had to be compliant with our license requirements and there was no demand to have SPDX header in each file.

You proposed change:

Proposed change

Since source files slipped in, the CI should run a check, that all sources carry the SPDX tag and that the license is a >> permissive kind.

can only apply to the zephyr code. Our license checker actually requires SPDX lisence to be present, so this is the first line of deference we have, if items were missed, we need to address those individualy and running SPDX tools on the whole tree is something we do AFAIK per release and can be extended to run mo frequently to audit the state of the tree.

Asking module owners to have SPDX headers in their code would be a new thing that we need to agree on and document and make it part of the acceptance criteria, so I would propose this as a separate item with the caveat that we also need to address all existing modules we already have as part of the project.

romkell commented 1 year ago

@nashif

@romkell The point I am trying to make is that modules were never required to have SPDX headers in licences. The license of the module in general had to be compliant with our license requirements and there was no demand to have SPDX header in each file.

Fully understood. Consequence of it: SPDX run and report of a build firmware will, dependent on which modules the build used, have some 30- 40% file with no license information.

The RFC asks for a CI supported automated verification that for all source files a license is found by the SPDX tool, to identify gaps. I feel, for the zephyr repo that is good enough. No need to increase frequency or anything.

The issue here is that

  1. the modules do not necessarily carry SPDX tags in the source (since not requested/demanded)
  2. the SPDX tool only supports that one way to identify the files license

We possibly need to do one step back and first clarify how to handle SPDX tagging specially for zephyrproject-rtos modules in general.

I see the following solutions (copied from above):

  1. do nothing - then the quality of the SPDX report high depends on what of the eco system you use
  2. have a non-standard add-on mechanism implemented in our version of SPDX tool - as suggested in Discord and copied in this RFC, feels a bit like adding a zephyr specific balcony to SPDX and the SPDX tool.
  3. Ask (demand) the module source be tagged properly - can we really demand that? Possibly for the module in zephyrproject-rtos, but rather not upstream) 3.1. provide a script as a supporting tool (basically find/replace), that can add a proper SPDX tag at a given location in the source files for the downstream sources in zephyrproject-rtos - no 100% automation, since rather error prone, rather only support to the module maintainer to easily distribute such tags, still manual checking required.

I see this RFC rather as bringing that circumstance to TSC attention, start a discussion, find a solution and close it. If it required to reword the issue because it requests to "Automate find missing SPDX tags in source files", I can do so.

nashif commented 1 year ago

removing TSC label for now, if you want to bring this back to the TSC, please relabel.

nashif commented 1 year ago

@carlescufi to look into using Nordic's manifest plugin, still TBD.

romkell commented 8 months ago

@carlescufi to look into using Nordic's manifest plugin, still TBD.

@carlescufi: Any decision on whether Nordic management agreed to open source your SPDX extension evaluating the license of not SPDX tagged sources?

nashif commented 4 months ago

@carlescufi any updates.