Open epijim opened 6 months ago
A few thoughts:
Whether the repo is curated by a governance body that might decide on appropriate licenses is still up in the air. My personal feeling is that it would be more productive to provide comprehensive information about the package (which would include the license information), but not to approve or reject a package.
Especially regarding a license, the appropriateness of a package varies heavily with the intended use. If you're not redistributing the package, your use case is very different than if you were selling a regulated product whose analysis methods are derived using some R packages. Of course, I'm not a lawyer, so I'm sure I'm not doing the decision a full service, but I imagine those are real concerns to the use of the package.
I don't think it's realistic to enforce that a LICENSE should only contain the copyright holder information. I'm pretty sure it's a best practice for redistributing code to include the redistributed code's license in the file. See the shiny
LICENSE
as one example.
The GPL-3 license for example, suggests this text as part of their warranty message
You should have received a copy of the GNU General Public License along with this program.
and I would imagine most people would use the LICENSE/LICENSE.md file.
When a LICENSE is large, we could assume it contains a copy of the license contents and diff it against a standard copy of the (probably from usethis
's bundled license files) and report similarity to narrow the search for anything non-standard. Ideally this would limit the review down to just the parts of the template that are intended to be modified, or let us know when the terms in the license have been rewritten and may not be in line with our expectations.
Perhaps this style of % diff could be a metric itself and the assessment app could even provide a diff viewer to investigate flagged licenses.
One topic discussed within Roche's Open Source office is that the impact of the licence of a dependency on a project is dependent on the context of the project itself. e.g. if you are generating a TLG for a CSR, you may be fine with copy-left, but if you want to bundle the dependency into a 'data product' (e.g. SaMD, or algorithm within physical medical device), you may have different requirements around what licences are ok to use in your dependencies.
The licence data for R packages though is messy:
DESCRIPTION
LICENSE.md
,LICENSE
orLICENCE.txt
Looking just at CRAN, below are the combinations present just for the
GPL
licence (ignoring extensions likeAGPL
).Could we move to have a mechanism to provide more curated licence information in the regulatory repo - e.g. one way to solve this is require the person submitting to look at
DESCRIPTION
+ the file if present and map it to a controlled terminology of licences so that we can accurately slice the repo based on different use cases having different licence requirements? Then we keep track of the field from the description file, so if it ever changes there is a warning for that package.A secondary topic is what to do about
DESCRIPTION
being different thanLICENSE.md
. I have seen this happen, but not in the wild (in my example it was caught as part of the release process for one of our packages). It could be present on CRAN though.