pharmaR / regulatory-r-repo-wg

Package consensus for regulated industries
https://pharmar.github.io/regulatory-r-repo-wg
27 stars 3 forks source link

Can we make the licence used a robust metadata item [DISCUSSION] #66

Open epijim opened 6 months ago

epijim commented 6 months ago

One topic discussed within Roche's Open Source office is that the impact of the licence of a dependency on a project is dependent on the context of the project itself. e.g. if you are generating a TLG for a CSR, you may be fine with copy-left, but if you want to bundle the dependency into a 'data product' (e.g. SaMD, or algorithm within physical medical device), you may have different requirements around what licences are ok to use in your dependencies.

The licence data for R packages though is messy:

Looking just at CRAN, below are the combinations present just for the GPL licence (ignoring extensions like AGPL).

Could we move to have a mechanism to provide more curated licence information in the regulatory repo - e.g. one way to solve this is require the person submitting to look at DESCRIPTION + the file if present and map it to a controlled terminology of licences so that we can accurately slice the repo based on different use cases having different licence requirements? Then we keep track of the field from the description file, so if it ever changes there is a warning for that package.

A secondary topic is what to do about DESCRIPTION being different than LICENSE.md. I have seen this happen, but not in the wild (in my example it was caught as part of the release process for one of our packages). It could be present on CRAN though.

Licence Number of packages
GPL-3 4566
GPL (>= 2) 4172
GPL-2 2384
GPL (>= 3) 1664
GPL 439
GPL-2 | GPL-3 337
GPL-3 | file LICENSE 141
GPL (>= 2.0) 98
GPL-2 | file LICENSE 50
GPL (>= 3.0) 43
GPL (>= 3) | file LICENSE 31
GNU General Public License 29
GPL-3 + file LICENSE 19
file LICENSE 9
GPL (>= 2) | file LICENCE 8
GNU General Public License (>= 3) 6
GNU General Public License version 2 5
GPL (>= 3) | file LICENCE 5
BSD_3_clause + file LICENSE | GPL (>= 2) 4
GNU General Public License version 3 3
GPL (>= 2.0) | file LICENSE 3
GPL (>= 2.1) 3
GPL (>= 3.5.0) 3
GPL-2 | GPL-3 | file LICENSE 3
GPL (> 3) 2
GPL (>= 2.10) 2
GPL (>= 2.15.1) 2
GPL-2 | file LICENCE 2
GPL-2 | GPL-3 | MIT + file LICENSE 2
BSD_3_clause + file LICENSE | GPL-2 1
CC BY-SA 4.0 | GPL (>= 2) 1
CC BY-SA 4.0 | GPL-3 | file LICENSE 1
file LICENCE 1
FreeBSD | GPL-2 | file LICENSE 1
GNU General Public License (>= 2) 1
GNU Lesser General Public License 1
GPL (<= 2.0) 1
GPL (<= 2) 1
GPL (== 2) 1
GPL (== 3.0) 1
GPL (>= 2.0) | file LICENCE 1
GPL (>= 2) | BSD_2_clause + file LICENSE 1
GPL (>= 2) | FreeBSD 1
GPL (>= 2) | LGPL (>= 2) 1
GPL (>= 2) | LGPL (>= 3) 1
GPL (>= 2) | LGPL-3 1
GPL (>= 2) | MIT + file LICENSE 1
GPL (>= 3.0.0) 1
GPL (>= 3.2) 1
GPL (>= 3.3.2) 1
GPL | file LICENSE 1
GPL-2 | Artistic-2.0 1
GPL-2 | GPL (>= 2) | GPL-3 1
GPL-2 | GPL-3 | BSD_3_clause + file LICENSE 1
GPL-2 | LGPL-2.1 | MPL-1.1 1
GPL-2 | MIT + file LICENCE 1
GPL-2 | MIT + file LICENSE 1
GPL-3 | BSD_2_clause + file LICENSE 1
GPL-3 | file LICENCE 1
GPL-3 | GPL-2 1
GPL-3 | LGPL-2.1 1
MPL (>= 2) | GPL (>= 2) | file LICENSE 1
dgkf commented 6 months ago

A few thoughts:

On the governance of licenses

Whether the repo is curated by a governance body that might decide on appropriate licenses is still up in the air. My personal feeling is that it would be more productive to provide comprehensive information about the package (which would include the license information), but not to approve or reject a package.

On the selection of appropriate licenses

Especially regarding a license, the appropriateness of a package varies heavily with the intended use. If you're not redistributing the package, your use case is very different than if you were selling a regulated product whose analysis methods are derived using some R packages. Of course, I'm not a lawyer, so I'm sure I'm not doing the decision a full service, but I imagine those are real concerns to the use of the package.

The expected contents of the LICENSE file

I don't think it's realistic to enforce that a LICENSE should only contain the copyright holder information. I'm pretty sure it's a best practice for redistributing code to include the redistributed code's license in the file. See the shiny LICENSE as one example.

The GPL-3 license for example, suggests this text as part of their warranty message

You should have received a copy of the GNU General Public License along with this program.

and I would imagine most people would use the LICENSE/LICENSE.md file.

Alternative tools-supported options

When a LICENSE is large, we could assume it contains a copy of the license contents and diff it against a standard copy of the (probably from usethis's bundled license files) and report similarity to narrow the search for anything non-standard. Ideally this would limit the review down to just the parts of the template that are intended to be modified, or let us know when the terms in the license have been rewritten and may not be in line with our expectations.

Perhaps this style of % diff could be a metric itself and the assessment app could even provide a diff viewer to investigate flagged licenses.