spice-labs-inc / goatrodeo

Apache License 2.0
1 stars 2 forks source link

Goat Rodeo

Build Status

Using Inherent Identifiers (e.g., cryptographic hashes) to describe nodes in Artifact Dependency Graphs is the same technique Git uses to identify branches, tags, etc.

Goat Rodeo builds an Artifact Dependency Graph of "things that contain other things". For example, Apache Directory is a tar.gz... a compressed TAR file that contains a series of JVM JAR file that contains a series of .class files. Goat Rodeo recursively builds an artifact dependency graph that describes all encountered artifacts.

The recursive building of the Artifact Dependency Graph is called "Deep Inspection."

The artifact dependency graphs for all inspected artifacts is emitted and can be queried and inspected using Big Tent.

This Scala 3 code is built with the nominal Simple Build Tool sbt and can be run on Java 17 or newer.

The code is licensed under and Apache 2.0 license and is intended to be used, shared, contributed to, etc.

Hidden Reapers

There are hundreds of thousands of Hidden Reapers in the JVM ecosystem. Sonatype has identied 336,000 across Maven Central. Goat Rodeo can be used to "unmask" the Hidden Reapers in any JAR files.

Building

Goat Rodeo requires Git LFS to build

Install large file support for Git.

Install sbt to build and run Goat Rodeo.

To create an "assembly" (a stand-alone executable JAR file): sbt assembly

The resulting JAR file can be executed: java -jar target/scala-3.3.3/goatrodeo.jar

To build an artifact dependency graph from local JAR files: java -jar target/scala-3.3.3/goatrodeo.jar -b ~/.m2 -o /tmp/gitoidcorpus -t 24

The above command tells the system to "build" (-b) the corpus from the JAR files in ~/.m2 and output the corpus to the /tmp/gitoidcorpus directory using 24 threads.

The resulting directory:

-rw------- 1 dpp dpp        76 Oct 10 13:25 2024_10_10_17_25_36_7cb8b580887ff6df.grc
-rw-rw-r-- 1 dpp dpp  14846423 Oct 10 13:25 6866766381176e9c.gri
-rw------- 1 dpp dpp 156324566 Oct 10 13:25 7c0d3cda0eff07c9.grd
-rw-rw-r-- 1 dpp dpp     20080 Oct 10 13:25 purls.txt

The purls.txt file contains the Package URLs of the discovered packages.

Development

Download a small set of JAR files to use as tests from https://goatrodeo.org/repo_ea.tgz

In ~/tmp untar the file: tar -xzvf repo_ea.tgz

This will be a set of files that Goat Rodeo will index as part of it's normal tests.

In the same directory (dpp uses ~/proj/) that you cloned Goat Rodeo, also clone Big Tent.

When you run the Goat Rodeo tests, a res_for_big_tent directory is created that contains generated index files. When you run Big Tent tests, the tests look for ../goatrodeo/res_for_big_tent and the generated files.

To create a test data set from within SBT: run "-b" "<path_to>/tmp/repo_ea" "-o" "/tmp/ff" "-t" "15"

Then use Cargo to build Big Tent: cargo build

From the Big Tent directory, you can run Big Tent against the data set: ./target/debug/bigtent -r /tmp/ff/<generated>.grc

With Big Tent running, you can curl: curl -v http://localhost:3000/omnibor/sha256:7c7b1dee41ae847f0d8aa66faf656f5f2fe777e4656db9fe4370d2972070c62b

That looks up the SHA256 value in the Corpus:

{

  "alt_identifiers": [],
  "connections": [
    [ "AliasTo",
      "gitoid:blob:sha256:4a176f25ca66f78f902082466d2e64bbb3ce5db3a327f006d48dc17a6fb58784"
    ]
  ],
  "identifier": "sha256:7c7b1dee41ae847f0d8aa66faf656f5f2fe777e4656db9fe4370d2972070c62b",
  "merged_from": [],
  "metadata": null,
  "previous_reference": null,
  "reference": [
    8847588522402379377,
    938
  ]
}

The AliasTo points to the actual GitOID which can also be fetched: curl -v http://localhost:3000/omnibor/gitoid:blob:sha256:4a176f25ca66f78f902082466d2e64bbb3ce5db3a327f006d48dc17a6fb58784

Have fun!

Participating

We welcome participation and have a Code of Conduct.