slsa-framework / slsa-github-generator

Language-agnostic SLSA provenance generation for Github Actions
Apache License 2.0
426 stars 128 forks source link

Workflow for Python packages #55

Open ianlewis opened 2 years ago

ianlewis commented 2 years ago

Add a workflow for building python packages and generating SLSA provenance for them. This is analogous to https://github.com/slsa-framework/slsa-github-generator-go for Go projects, except for Python packages.

This is to help achieve the milestones laid out in the SLSA roadmap: https://github.com/slsa-framework/slsa-proposals/tree/main/0002#milestone-slsa-4-builds-are-possible-for-specific-packaging-ecosystems

ianlewis commented 2 years ago

Here is a summary of the relevant file formats and tools to consider when building python packages. The Python Packaging User Guide provides a good summary of the available tools and methods.

Project definition files:

Tools to build packages

These tools and packages support building packages based on either pyproject.toml or setup.cfg. Whether projects use pyproject.toml or setup.cfg there is enough tool specific configuration that we will likely need to detect the build tool used.

For example, many projects specify dependencies of only requires = [ "poetry-core>=1.0.0",] in the tool-agnostic build-system section and specify the real dependencies under tool.poetry.dependencies.

build seems to be the de-facto "blessed" build tool, however actual usage seems heavily divided among many build tools and there is no one clear leader.

Tools to publish packages

Support for publishing provenance could maybe be added to these tools after pypi has support?

Tools used to install packages

Scientific community

The scientific community seems to use other pypi/pip incompatible package managers fairly frequently due to needing access to specialized libraries and tools.

ianlewis commented 2 years ago

@di I summarized some info on python package builders above with the goal of better understanding we could put together a builder that will be able to build python packages and generate SLSA provenance for it.

As I understand it, if folks wrote pypackage.yaml in such a way as to set requires, build-backend, and build-path properly then theoretically any package build using pypackage.yaml could be bootstrapped by the build tool.

However, I don't think most packages are set up this way and, without modification, will require us to know the build tool up front in order to build the package properly. That or we force them to update their package to allow building with the build tool. And many folks still use setuptools.

Any thoughts or comments? Anything I got wrong or am misunderstanding?

di commented 2 years ago

Any thoughts or comments? Anything I got wrong or am misunderstanding?

I think this is fairly accurate. I'd say you should think of build as the canonical generic build tool, and everything else (flit, setuptools, etc) as PEP-517 build backends (even though some can also be used as build tools).

As I understand it, if folks wrote pypackage.yaml in such a way as to set requires, build-backend, and build-path properly then theoretically any package build using pypackage.yaml could be bootstrapped by the build tool.

This is correct (although it's pypackage.toml 😉).

However, I don't think most packages are set up this way and, without modification, will require us to know the build tool up front in order to build the package properly. That or we force them to update their package to allow building with the build tool. And many folks still use setuptools.

I think it's really our only option. We can do some things to guess the build backend that needs to be present, but it's not guaranteed to be correct, or capture all bulid-time dependencies. Instead, pyproject.toml is the standardized way to do this. I think it would be reasonable to say we only support building projects that specify pyproject.toml.

From my perspective, we're seeing a lot of projects moving to this (and a lot of tools moving their configuration into this file).

ianlewis commented 2 years ago

@di Thanks. I think build is definitely the thing we should do first. I think you're right that a decent number of folks use pyproject.toml but I didn't see many in my searches that set the backends properly. They looked like they expected you to install and/or run the build backend directly.

That said, it should be pretty easy for folks to update their pyproject.toml to allow us to just run python -m build on their project and have it work, so maybe that will cover ~80% of projects. Folks still using setuptools only would have issues though but maybe we can support that at some point if it's as issue.

Scientific community projects also worry me because lots use conda, but I'm ok with not supporting them, at least initially.

di commented 2 years ago

I think build is definitely the thing we should do first. I think you're right that a decent number of folks use pyproject.toml but I didn't see many in my searches that set the backends properly. They looked like they expected you to install and/or run the build backend directly.

Note that build uses a fallback backend if none is specified, so that might be the case: https://python-build.readthedocs.io/en/stable/#fallback-backend

ianlewis commented 2 years ago

Note that build uses a fallback backend if none is specified, so that might be the case: https://python-build.readthedocs.io/en/stable/#fallback-backend

Ah, ok. Noted. Though I did see a lot of pyproject.toml files with [tool.foo] sections but without any requires or backend set so I expect those not to work.

di commented 2 years ago

Yeah, these would be the projects that build would use the fallback backend for. The entire [build-system] section would be missing.

ianlewis commented 2 years ago

Yeah, these would be the projects that build would use the fallback backend for. The entire [build-system] section would be missing.

If, for example, it's exclusively [tool.poetry.x] sections in pypackage.toml, I wouldn't expect python -m build to work because build would fallback to setuptools. For example: https://github.com/razy69/poetry-tox-template/blob/208cbc3f7d972a0c7862c2f54c9dfe3fe5b74e54/pyproject.toml

If there was a separate and analogous setup.cfg or setup.py it would work though I guess.

Am I understanding it right? Or would build figure it out somehow?

ianlewis commented 2 years ago

We may also want to consider not having a builder in this repo but one that is better endorsed by the python commmunity. e.g. via https://github.com/pypa

di commented 2 years ago

That example specifies:

[build-system]
requires = ["poetry-core>=1.1.0a6"]
build-backend = "poetry.core.masonry.api"

So in theory, the build should work, but looks like it doesn't due to some issue with that project, not with Poetry or build:

$ git clone git@github.com:razy69/poetry-tox-template.git
Cloning into 'poetry-tox-template'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 14 (delta 2), reused 14 (delta 2), pack-reused 0
Receiving objects: 100% (14/14), 22.00 KiB | 7.33 MiB/s, done.
Resolving deltas: 100% (2/2), done.

$ cd poetry-tox-template/

$ python -m build
* Creating venv isolated environment...
* Installing packages in isolated environment... (poetry-core>=1.1.0a6)
* Getting dependencies for sdist...
* Building sdist...
Traceback (most recent call last):
  File "/home/di/.local/lib/python3.9/site-packages/pep517/in_process/_in_process.py", line 363, in <module>
    main()
  File "/home/di/.local/lib/python3.9/site-packages/pep517/in_process/_in_process.py", line 345, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/home/di/.local/lib/python3.9/site-packages/pep517/in_process/_in_process.py", line 314, in build_sdist
    return backend.build_sdist(sdist_directory, config_settings)
  File "/tmp/build-env-oxrpsqpl/lib/python3.9/site-packages/poetry/core/masonry/api.py", line 77, in build_sdist
    path = SdistBuilder(poetry).build(Path(sdist_directory))
  File "/tmp/build-env-oxrpsqpl/lib/python3.9/site-packages/poetry/core/masonry/builders/builder.py", line 86, in __init__
    self._module = Module(
  File "/tmp/build-env-oxrpsqpl/lib/python3.9/site-packages/poetry/core/masonry/utils/module.py", line 71, in __init__
    raise ModuleOrPackageNotFound(
poetry.core.masonry.utils.module.ModuleOrPackageNotFound: No file/folder found for package template

ERROR Backend subproccess exited when trying to invoke build_sdist