pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.84k stars 266 forks source link

Feature request: export lockfile in SPDX format #2102

Open jwarwick-delfi opened 1 year ago

jwarwick-delfi commented 1 year ago

As a consumer of Pex lockfiles via the pants build tool, I would like to export a lockfile in an open format that I can use to generate a software bill of materials (SBOM). SPDX seems to be the widely-used open standard for these files.

SPDX can be expressed in a variety of formats, personally I would prefer text, JSON, or YAML.

jsirois commented 1 year ago

Since Pex vendors a very limited set of 3rd party libraries it uses, sticking to the stdlib is best; so text or JSON are preferred from the Pex point of view.

jsirois commented 1 year ago

Hrm. A quick read of the spec seems to suggest each file must have 1 sha1 checksum and then 0 or more other checksums: https://spdx.github.io/spdx-spec/v2.3/file-information/#84-file-checksum-field

A lockfile only contains sha256 checksums and so generating a valid SPDX will require downloading every artifact in a lockfile and re-fingerprinting it down to sha1. This is not awesome.

jsirois commented 1 year ago

Ok, the code that implements pex3 lock export ... is here: https://github.com/pantsbuild/pex/blob/fd9a07f3cc4e8a3f64eb2c9850f7936c67453315/pex/cli/commands/lock.py#L493-L516

That currently exports for just 1 distribution target, where a distribution target in Pex-speak is a particular local Python interpreter or else a foreign platform's interpreter. If your SBOM will be attached to a single platform in this way (say 1 SBOM per each of Python 3.7, 3.8 and 3.9 and per Linux and Mac for a total of 6 SBOMs), then all is well, you just run export six times configuring a different target for each run. If your SBOMs are intended to be singular and need to incorporate data for all distribution targets, a new sub-command is probably warranted pex3 lock sbom .... Either way, the key data structure is contained in lock_file on line 500. That is a XXX and is defined here: https://github.com/pantsbuild/pex/blob/32a0789ee4d431f0d84b3f1e924bb91b78cde1cd/pex/resolve/lockfile/model.py#L29-L51

jsirois commented 1 year ago

If, instead of exporting an entire lockfile as an SBOM, individual built-PEX files could export (or even include) an SBOM, things become alot simpler since the actual used software is all present along with licenses and other metadata. Re-hashing becomes ecosystem-friendly, etc.

There is already a suite of tools that can either be included in a PEX file by using --include-tools when building the PEX or else by using the pex-tools console script installed alongside pex.

These live here: https://github.com/pantsbuild/pex/tree/main/pex/tools/commands The repository, graph and venv commands all do portions of the work that will be needed here - in particular they resolve the PEX's distributions.

Perhaps best is to start looking at graph which generates a graphviz svg graph of a PEX's internal software which is part way to an SBOM.

The run main entrypoint of the tool is here: https://github.com/pantsbuild/pex/blob/e0efca098404a6093f514839103ea7843920e4fa/pex/tools/commands/graph.py#L154-L156

The PEX resolve is done here: https://github.com/pantsbuild/pex/blob/e0efca098404a6093f514839103ea7843920e4fa/pex/tools/commands/graph.py#L33-L49

And the resolved things are Distributions defined here: https://github.com/pantsbuild/pex/blob/e0efca098404a6093f514839103ea7843920e4fa/pex/dist_metadata.py#L549-L550