Open tetsuo-cpp opened 3 years ago
I think syft
could be really useful for us. It has quite a bit of functionality for both container images and filesystems and supports a bunch of different language ecosystems. The relevant bits for us are:
syft
a container image.Interesting files are:
Some potential issues:
syft
just looks for package metadata on the file system. So if I have a container that has a wheel on the filesystem that hasn't been installed to any Python, it's still going to end up in the package list. I initially thought this was weird, but after thinking about it more, auditing a container is a bit of a fuzzy idea since it can have multiple Python environments in it. So just auditing anything on the file system that looks like a package isn't that unreasonable.syft
via subprocess
. We should probably talk to the devs and figure out whether we can rely on any of the output formats to remain stable since we'll have to parse it in pip-audit
and get a list of dependencies out of it.syft
is in your PATH
" and just leave it to the user.pip
.Thinking about how this compares to the alternatives:
I'll keep an eye out but I wasn't able to find anything that fits the bill. Tern is interesting but it seems more focused on packages installed via the distro package manager.
pip-audit
The Python-specific code in syft
looks ok but I think the most painful thing about reimplementing this functionality in pip-audit
will be parsing the Docker image, traversing over each layer, etc. syft
does this by using stereoscope.
I had an idea that it might be possible to leverage some of Tern's image parsing modules (also in Python) for this purpose and write the Python-specific parts on top of it. I'm not sure whether Tern was designed to be used as a library in the way that I'm thinking and there seems to be some platform support issues which might affect us.
A further note in terms of reimplementing: Docker's Python SDK is pretty well-featured, and includes a low-level API that might be able to do the kind of image introspection we need.
Edit: It looks like Tern uses the Docker Python SDK:
I think the ideal tool would:
AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.
AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.
Yeah, I believe Docker's Python SDK can't really do anything without connecting to a Docker daemon. So if we don't want to assume the presence of Docker, we probably can't directly dupe or reuse their approach.
I'll do some additional searching for something that looks like stereoscope, but in Python. It might also be possible to write a native Python extension that adapts stereoscope directly, although I'm not familiar with what that looks like with Go (I've done it for Rust and C/C++ and I've used Go extensions, but never written the latter).
The
syft
tool supports generating a SBOM for a container image and has support for Python packages. We should check to see if we can leverage this to support container images inpip-audit
.cc: @di