spdx / tools

SPDX Tools
Apache License 2.0
129 stars 70 forks source link

Dependency Tree Exclusions for RDF/Tag parsing #145

Closed stevespringett closed 6 years ago

stevespringett commented 6 years ago

As an observation, the dependency tree for v2.1.7 looks like:

+- org.spdx:spdx-tools:jar:2.1.7:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- com.yevster.net.rootdev:java-rdfa:jar:0.4.3:compile
|  |  \- net.rootdev:java-rdfa-htmlparser:jar:0.4.2-RC2:compile
|  +- xml-apis:xml-apis:jar:1.4.01:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.slf4j:slf4j-log4j12:jar:1.7.2:compile
|  +- log4j:log4j:jar:1.2.13:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  +- org.eclipse.jgit:org.eclipse.jgit:jar:4.7.1.201706071930-r:compile
|  |  +- com.jcraft:jsch:jar:0.1.54:compile
|  |  \- com.googlecode.javaewah:JavaEWAH:jar:1.1.6:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

When attempting to use SPDX tools simply as a way to parse SPDX Tag and RDF documents, there are many dependencies included in the parent project that are never used.

I've been attempting to omit them from my project, as many of them are old or conflict with other dependencies in my project. The POM excerpt reads:

<dependency>
    <groupId>org.spdx</groupId>
    <artifactId>spdx-tools</artifactId>
    <version>2.7.1</version>
    <exclusions>
        <exclusion>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
        </exclusion>
        <exclusion>
            <groupId>net.sf.opencsv</groupId>
            <artifactId>opencsv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.spullara.mustache.java</groupId>
            <artifactId>compiler</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.eclipse.jgit</groupId>
            <artifactId>org.eclipse.jgit</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.jcraft:jsch</groupId>
            <artifactId>jsch</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.googlecode.javaewah</groupId>
            <artifactId>JavaEWAH</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.thrift</groupId>
            <artifactId>libthrift</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.jsonld-java</groupId>
            <artifactId>jsonld-java</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With this configuration, I'm able to properly parse RDF and Tag 2.0 and 2.1 examples in this repo.

I don't know if this config will cause issues if other (potentially more complex) RDF or Tag documents are parsed. Thoughts?

Also, it would really be nice to have the exact exclusions documented somewhere.

goneall commented 6 years ago

opencsv, mustache, and jgit (which adds the dependencies on jsch and javaEWAH) are used by the LicenseRDFaGenerator which is a tool that generates the license metadata for the website spdx.org/licenses. There is also a tool that converts SPDX files to an HTML format which uses Mustache.

I tried removing the XML API's and the only compile time issue was with the LicenseXmlDocument which is only used by the LicenseRDFaGenerator.

Some of your exclusions relate to Jena which is used to manage the RDF representation. My guess is that the exclusions you are using would only affect certain formats which are not currently used by any of the SPDX tools (e.g. JSON-LD).

The one exclusion I'm not sure about is libthrift. That is used by Jena - for which purpose I am not sure.

I have been thinking about refactoring the SPDX tools into 2 separate repositories - one containing the library and one with separate tools.

Based on the information collected above on the dependencies, it may be worthwhile splitting the LicenseRDFaGenerator into a separate repo. As far as I know, this tool is only used by the SPDX legal team.

goneall commented 6 years ago

Update - I'm working on de-tangling the LicenseRDFaGenerator from the rest of the library and I was able to remove jgit and xml-apis.

It turns out opencsv is used by some HTML tools (which should not impact the license conversion) and openCSV is used by the spreadsheet tools (again, should not impact the license conversion).

goneall commented 6 years ago

@stevespringett - I just removed the dependency on the RDFa library in version 2.1.12. Does this resolve this issue or is there more we could do?

stevespringett commented 6 years ago

Big thanks @goneall. The removal of the unnecessary dependencies and generation code is greatly appreciated.

As of 2.1.12, the dependency tree now looks like:

+- org.spdx:spdx-tools:jar:2.1.12:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.apache.logging.log4j:log4j-api:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-core:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.10.0:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

BTW, OWASP Dependency-Track incorporates this library (2.1.7 in the current release and 2.1.12 in the current development branch) for its SPDX support.

Closing issue.