Closed karacolada closed 11 months ago
The domain-specific assessment criteria don't match the general ones for "important" and "useful".
What does "The source code includes licensing information for all components bundled with that software" mean really? Is it that all files should include licensing info or is it really about bundling, dependencies etc? I.e. is this information that I check by crawling multiple files or by reading the one central license file?
Components = bundled dependencies
Started developing a GitHub-specific harvester that checks for the license. Modified FAIREvaluatorLicense to consider that information as well.
switching between domain-specific and "reference"/general using metric YAMLs
Do the tests need to be inclusive downwards? I.e. should general-useful only pass is general-important has passed? Asking because the maturity level for the metric is simply the max.
The title of the metric includes "bundled external software", but that doesn't match the CESSDA tests.
Implemented CESSDA-specific essential test. I took it literally, as in it fails if it's not a file names LICENSE.txt
at the root of the repository. Do we really want this?
Implemented CESSDA-specific useful test. It only checks Maven POM files and fails if the build script is not configured to fail on missing headers. Do we want to be this strict, or are we ok if people just use the plugin and (I assume) get warnings if the license headers are not present?
To add other kinds of build scripts, I would need to know more about wherther and how they implement license header checking - seems like Pandora's box?
Implemented CESSDA-specific important test. It utilises the build script test, assuming that if that passes, all source code files do have license headers.
The harvester checks for the main language and uses the GitHub Search API to look for code in that language. We then store the code in up to 5 of the found files. The license test then checks the region where it would expect the license header (first 30 lines of code) for the word "license". Is this ideal? Should we instead try and look for the license name found in test 1? But that makes the tests dependent on each other...
Some of these questions/discussion points are very specific, but I think talking through them will help guide future development,
Put a warning log into general-2 and check that the rest still passes as expected. Then close!
D5.2 p21, p30
Detailed Description
Clear software licensing enables reuse.
Generic comments
Each community may have different licences that are popular.
It is important that software licences are included with the source code as many tools and processes look for licensing information there to determine licence compatibility.
The SPDX License List is a widely used part of the Software Project Data eXchange (SPDX) open standard. Information about the licence for a piece of software can be provided either as a file in the source code repository, or as a short identifier embedded in the source code files.
CESSDA comments
CESSDA guidance on licence information is part of the guidelines on Standard Git Repository Contents. Further guidance is provided as part of the guidance on CMA2 - Intellectual Property.
Context
R1.1: Software is given a clear and accessible licence.
Possible Implementation
CESSDA