merklecounty / rget

download URLs and verify the contents against a publicly recorded cryptographic log
https://merklecounty.com
Apache License 2.0
205 stars 17 forks source link

Perform License/Copyright Audit #29

Closed brianredbeard closed 5 years ago

brianredbeard commented 5 years ago

Just to play "belt and suspenders" it couldn't hurt to do a spot check of the licenses, copyrights, et al of components used in the components of rget.

I'm volunteering to do a one off audit and post the results of this as a response to this issue. I'll include the methodology of how it was performed so that it can be reproduced and/or integrated into a daily CI check.

This is in relation to 182c27b

brianredbeard commented 5 years ago

About

Checks were run with two different license scanning tools, one which is a general purpose toolkit (Scancode) and one which is Go specific (WWHRD). There are trade-offs between both tools with differing nuances of what each is trying to solve.

Scancode

From the project:

ScanCode detects licenses, copyrights, package manifests & dependencies and more by scanning code ... to discover and inventory open source and third-party packages used in your code.

Indeed, Scancode did provide comprehensive output though due to it's extensive tests it took an order of magnitude more time than WWHRD. This is not an apples to apples test though because using Scancode an index of copyright information and check of every file in the repository occurred.

Due to the intense level of scanning, when using 4 threads the scan regularly took over 1200s. Using fewer threads resulted in a roughly linear time increase. The breakdown spent was as follows:

Scan files for: licenses, copyrights with 4 process(es)...
[####################] 3873                                                                                
WARNING: Files are missing a SHA1 attribute. Incomplete SPDX document created.
Scanning done.
Summary:        licenses, copyrights with 4 process(es)
Errors count:   0
Scan Speed:     3.31 files/sec. 
Initial counts: 4683 resource(s): 3873 file(s) and 810 directorie(s) 
Final counts:   4683 resource(s): 3873 file(s) and 810 directorie(s) 
Timings:
  scan_start: 2019-08-18T220420.995858
  scan_end:   2019-08-18T222449.025649
  setup_scan:licenses: 2.94s
  setup: 2.94s
  inventory: 1.99s
  pre-scan:classify: 3.54s
  pre-scan: 3.54s
  scan: 1168.85s
  post-scan:summary: 35.24s
  post-scan:license-clarity-score: 10.42s
  post-scan: 45.65s
  output:html: 4.40s
  output:json: 3.04s
  output:spdx-rdf: 18.05s
  output: 25.49s
  total: 1254.68s

WWHRD

From the project:

Have Henry Rollins check vendored licenses in your Go project.

While I don't believe any true effort in the checking of license metadata could be attributed to Henry Rollins himself[1], the utility checked everything in /vendor very quickly and provided output which was very easy to quickly parse visually.

In repeated tests WWHRD was able to complete the check of all vendored packages in under 2 seconds.

In fact, WWHRD did uncover some nuances of upstream components which add additional yak shaving for downstream projects. Because of this upstream bugs and patches have been filed to assist the maintainers.

Context

All checks run on 896e62f4644737f6070cbc8b87a149c3070d0608.

Result (TL;DT)

:+1: Everything looks good.

Methodology

All tests were run multiple times to ensure reproducibility.

Scancode

After downloading and running the (brief) scancode setup script the invocation of the tool used was as follows:

$ ./scancode --classify --license-clarity-score  --summary -n 4  --license \
  --copyright --spdx-rdf=/tmp/rget.spdx-rdf --html=/tmp/rget.html \
  --json=/tmp/rget.json  ~/Projects/golang/src/github.com/merklecounty/rget`

The results were useful, a comprehensive human readable HTML file, a JSON file which could quickly be scanned with jq to extract relevant information, and an SPDX file.

The resulting assets are available as follows:

WWHRD

The installation of WWHRD is achieved in the normative mechanism of most Go based tooling:

$ go get -u github.com/frapposelli/wwhrd

Though there is a Brew package available for OSX users as well.

After installing the tool, it needs a configuration file. The purpose of the configuration file is to allow the tool to output a simple exit code status for use with CI, knowing that the project is attempting to use the Apache-2.0 license, the following configuration file (.wwhrd.yml) was used:

---
whitelist:
    - Apache-2.0
    - MIT
    - BSD
    - FreeBSD
    - NewBSD
    - GPL-3.0
    - MPL-2.0

blacklist:
    - GPL-2.0
    - MPL-1.1

The tool is then run using the command:

$ wwhrd check -q
ERRO[0000] Found Non-Approved license                    license=unrecognized package="github.com/nmrshll/oauth2-noserver"
ERRO[0000] Found Non-Approved license                    license=unrecognized package="github.com/skratchdot/open-golang/open"
FATA[0000] Exiting: Non-Approved license found  

Performing a manual audit of these two packages revealed that they were both using the MIT license. As mentioned above, patches were filed with upstream projects to assist other users who may experience challenges. In the location configuration the following additional configuration was added to .wwhrd.yaml:

exceptions:
    - "github.com/nmrshll/oauth2-noserver"
    - "github.com/skratchdot/open-golang/open"

That resulting configuration file is supplied here: wwhrd.yml.gz

Result

Overall (as completely expected) everything looks great. There were no show stoppers and the only concern was copyright notices missing from the following files, discovered using the output from scancode:

$ jq '[.files[] | select(.copyrights[].value | contains("NAME HERE")) | {path, "copyright": .copyrights[].value} ] ' rget.json   
[
  {
    "path": "rget/rget/main.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  },
  {
    "path": "rget/rget/cmd/github.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  },
  {
    "path": "rget/rget/cmd/root.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  },
  {
    "path": "rget/rget/cmd/server.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  },
  {
    "path": "rget/rget/cmd/submit.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  },
  {
    "path": "rget/rget/cmd/verify.go",
    "copyright": "Copyright (c) 2019 NAME HERE"
  }
]

[1]: Of course, I'm aware that this is not using any type of "Henry Rollins as a service" (HRaaS) mechanism. :nerd_face: