guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.26k stars 165 forks source link

[feature] Implement collector for ClearlyDefined #1964

Closed funnelfiasco closed 1 month ago

funnelfiasco commented 2 months ago

Implementing a portion of #1014, let's implement a GUAC collector for license data from ClearlyDefined as described in https://docs.google.com/document/d/1NmLlU5wuP2X9CK7QCWZkkOciNn1QFLKQCFCW9CEI8HQ/edit#heading=h.q8v64s9nqno

This will allow GUAC users to include license-related information, which can be helpful for spotting compliance risks.

nickvidal commented 2 months ago

Thanks @funnelfiasco, I'll be sharing this initiative with the ClearlyDefined community!

pxp928 commented 2 months ago

I have started work on this issue. Thanks to @nickvidal and the community for providing guidance on how to map between purl and coordinates used by clearlyDefined.

Examples below illustrate coordinates for each of the following purl type supported.  In general, the following holds true:
purl Type = type coordinate
purl namespace = namespace coordinate
purl name = name coordinate
purl version = revision coordinate

There are some exceptions however, which are provided in the notes below.  

cocoapods https://cdn.cocoapods.org/.
-> pod (coordinate)
e.g. pod/cocoapods/-/SoftButton/0.1.0

cargo https://crates.io/.
-> crate (coordinate)
e.g. crate/cratesio/-/bitflags/1.0.4

composer https://packagist.org.
-> composer (coordinate)
e.g. composer/packagist/symfony/polyfill-mbstring/1.11.0

conda https://repo.anaconda.com.
-> conda (coordinate)
e.g. conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0
notes:
channel -> provider coordinate
    3 providers: anaconda-main, anaconda-r, conda-forge
subdir -> namespace coordinate
version-build -> revision coordinate
e.g.
pkg:conda/absl-py@0.4.1?build=py36h06a4308_0&channel=main&subdir=linux-64&type=tar.bz2
-> conda/anaconda-main/linux-64/absl-py0.4.1-py36h06a4308_0

deb
-> deb (coordinate)
e.g. deb/debian/-/mini-httpd/1.30-0.2_arm64
notes:
1.
version_architecture -> revision coordinate
2.
source package:
debsrc/debian/-/mini-httpd/1.30-0.2

gem https://rubygems.org.
-> gem (coordinate)
e.g. gem/rubygems/-/sorbet/0.5.11226

github https://github.com.
-> git/github (coordinate type/provider)
e.g. git/github/ratatui-org/ratatui/bcf43688ec4a13825307aef88f3cdcd007b32641

golang for Go packages:
-> go (coordinate)
e.g. go/golang/rsc.io/quote/v1.3.0
name is url encoded.

maven https://repo.maven.apache.org/maven2.
-> maven (coordinate)
three providers: mavencentral, mavengoogle and gradleplugin
e.g. 
maven/mavencentral/org.apache.httpcomponents/httpcore/4.3
maven/mavengoogle/android.arch.lifecycle/common/1.0.1
maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.6.0
note:
source component:
sourcearchive/mavencentral/org.apache.httpcomponents/httpcore/4.3

npm 
-> npm (coordinate)
e.g. npm/npmjs/-/redis/0.1.0
namespace is used for scope

nuget: https://www.nuget.org.
-> nuget (coordinate)
e.g. nuget/nuget/-/xunit.core/2.4.1

pypi https://pypi.org
-> pypi (coordinate)
e.g. pypi/pypi/-/backports.ssl_match_hostname/3.7.0.1
nickvidal commented 1 month ago

Thank you @pxp928 and @qtomlinson!

qtomlinson commented 1 month ago

e.g. pkg:conda/absl-py@0.4.1?build=py36h06a4308_0&channel=main&subdir=linux-64&type=tar.bz2 -> conda/anaconda-main/linux-64/absl-py0.4.1-py36h06a4308_0

correction: -> conda/anaconda-main/linux-64/absl-py/0.4.1-py36h06a4308_0

nickvidal commented 1 month ago

I've published the coordinates and purl type mapping to our documentation:

https://docs.clearlydefined.io/docs/resources/coordinates