tschm / token-mint-action

Creates an api token for trusted publishing in pypi
Apache License 2.0
5 stars 1 forks source link

Avoid suggesting bad practice of building in the same GHA job as publishing #14

Open webknjaz opened 10 months ago

webknjaz commented 10 months ago

Having access to OIDC opens up a can of worms related to privilege elevation during the build. And the compromised targets might be not just PyPI, but any other OIDC integrations people may have set up (and sometimes misconfigured). This also doesn't allow for adequate workflows of synchronized publishing of platform-specific wheels. Here's a note regarding the security considerations that we have in pypi-publish, for example: https://github.com/marketplace/actions/pypi-publish#trusted-publishing.

Based on the above, I suggest modifying the README example to make use of two jobs and pass GHA artifacts between them. This also allows to smoke-test the build even if publishing gets skipped.

We're currently upgrading the PyPUG guide with similar considerations: https://github.com/pypa/packaging.python.org/pull/1261.

tschm commented 10 months ago

Thank you. Will split the job into two parts. Good idea

tschm commented 10 months ago

poetry is striking back https://github.com/orgs/python-poetry/discussions/8357

tschm commented 10 months ago

I shall use twine in the 2nd job. No poetry needed

tschm commented 10 months ago

@webknjaz Do we like this version more?

name: Upload Python Package

on:
  release:
    types: [published]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      # step used to set up python, install poetry and to create
      # to virtual environment described in poetry.lock
      - uses: cvxgrp/.github/actions/setup-environment@main

      # change the version in pyproject.toml and build the package
      - name: Change version in pyproject.toml
        shell: bash
        run: |
           poetry version ${{  github.ref_name }}
           poetry build

      # Archive the dist folder
      - name: Archive sphinx documentation
        uses: actions/upload-artifact@v3
        with:
          name: dist
          path: dist
          retention-days: 1

  deploy:
    runs-on: ubuntu-latest
    needs: build
    environment: release

    permissions:
      # This permission is required for trusted publishing.
      id-token: write

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v3

      # Download the dist folder from the build job
      - uses: actions/download-artifact@v3
        with:
          name: dist
          path: dist

      # Generate the token
      - name: Mint token
        id: mint
        uses: tschm/token-mint-action@v1.0.2

      # Install twine and upload the dist folder using the token
      - name: Publish the package
        shell: bash
        run: |
           pip install twine
           twine upload --verbose -u __token__ -p '${{ steps.mint.outputs.api-token }}' dist/*
webknjaz commented 10 months ago

I shall use twine in the 2nd job. No poetry needed

Well, at this point, why not just use my action that wraps twine and aquires tokens internally?

webknjaz commented 10 months ago

Do we like this version more?

The publish job doesn't need to call checkout or content permission. Plus see my comment above — this is reinventing the wheel.

tschm commented 10 months ago

Do we like this version more?

The publish job doesn't need to call checkout or content permission. Plus see my comment above — this is reinventing the wheel.

I am well aware of the gh-action-pypi-publish. The initial goal was not to give people a tool that can upload to pypi in any way. We wanted to restrict their options and enforce trusted publishing. For that it was enough to generate the token and do a poetry publish. This token mint was a byproduct of this process. Following the GitHub mess with deployment to protected tags I need to revisit our deployment process anyway.

webknjaz commented 10 months ago

I am well aware of the gh-action-pypi-publish. The initial goal was not to give people a tool that can upload to pypi in any way. We wanted to restrict their options and enforce trusted publishing. For that it was enough to generate the token and do a poetry publish. This token mint was a byproduct of this process.

pypi-publish primarily advertises the Trusted Publishing flow, only mentioning other options in the very bottom of the document. This is also intended to steer people towards using secretless publishing. The updated PyPUG guide will do so as well, once complete. We also chose to annoy users with educational warnings if they attempt publishing to PyPI w/o OIDC: https://github.com/pypa/gh-action-pypi-publish/issues/164 / https://github.com/pypa/gh-action-pypi-publish/pull/167.

Following the GitHub mess with deployment to protected tags I need to revisit our deployment process anyway.

By the way, the example workflow should also recommend using GitHub Environments since that allows for extra protection, like requiring a manual button click to allow the job to even start.

P.S. If you're building a scalable process to enforce OIDC, I'd recommend using jsonschema-based linting. I like https://check-jsonschema.rtfd.io/en/latest/precommit_usage.html#example-usages that lets one check GHA workflow files and action repos for basic problems. There's an example of enforcing that timeout-minutes is set on jobs: https://check-jsonschema.readthedocs.io/en/latest/precommit_usage.html#check-a-builtin-schema. You can also make your own schema to enforce certain arguments to be present/absent from the action invocation inputs. It integrates well with pre-commit.com / pre-commit.ci too.

tschm commented 10 months ago

By the way, the example workflow should also recommend using GitHub Environments since that allows for extra protection, like requiring a manual button click to allow the job to even start.

We are using a release environment in the example? Do you mean we should more explicitly mention it or do we oversee something here?

P.S. If you're building a scalable process to enforce OIDC, I'd recommend using jsonschema-based linting. I like https://check-jsonschema.rtfd.io/en/latest/precommit_usage.html#example-usages that lets one check GHA workflow files and action repos for basic problems. There's an example of enforcing that timeout-minutes is set on jobs: https://check-jsonschema.readthedocs.io/en/latest/precommit_usage.html#check-a-builtin-schema. You can also make your own schema to enforce certain arguments to be present/absent from the action invocation inputs. It integrates well with pre-commit.com / pre-commit.ci too.

Ok, we run the checks of the workflows. I have not specified any schema though. I assume it will check against the standard schema for workflows?

webknjaz commented 10 months ago

We are using a release environment in the example? Do you mean we should more explicitly mention it or do we oversee something here?

Well, in the current readme it's not used. Only in the example above. But envs are auto-created. The problem is that they are created without any protection on. So it'd be important to call out that it's recommended to create the env manually and set up required reviewers there, for example, so the workflow is paused before starting the job. Otherwise, it's not doing much. Setting this up will make it so only trusted humans would be able to resume the workflow execution, letting the publishing to actually happen.

webknjaz commented 10 months ago

Ok, we run the checks of the workflows. I have not specified any schema though. I assume it will check against the standard schema for workflows?

Yes, it runs the checks against some standard schemas published on the schema store.

You can add another check pointing at an in-repo schema file that you would make yourself.

tschm commented 10 months ago

Ok, we run the checks of the workflows. I have not specified any schema though. I assume it will check against the standard schema for workflows?

Yes, it runs the checks against some standard schemas published on the schema store.

You can add another check pointing at an in-repo schema file that you would make yourself.

Ok, understood. Might come in handy at different other projects where the yaml file is say the configuration for a professional engine trading international stock markets :-)

tschm commented 10 months ago

We are using a release environment in the example? Do you mean we should more explicitly mention it or do we oversee something here?

Well, in the current readme it's not used. Only in the example above. But envs are auto-created. The problem is that they are created without any protection on. So it'd be important to call out that it's recommended to create the env manually and set up required reviewers there, for example, so the workflow is paused before starting the job. Otherwise, it's not doing much. Setting this up will make it so only trusted humans would be able to resume the workflow execution, letting the publishing to actually happen.

I see. I guess we have done this almost correct then. I tend to use the release environment and I have protection rules in place for this environment. So, I guess we ramp up the protection for this environment.

webknjaz commented 10 months ago

I usually call the environment pypi because: 1) There might also be testpypi, which requires different rules and secrets 2) it's a clearly separate publishing target 3) there might be other targets like GH releases, they will likely do semantically different things. 4) The name release (that I also used to use in the past) is too broad and would match any of these separate jobs so it doesn't help to accurately identify its purpose.

tschm commented 10 months ago

I usually call the environment pypi because:

  1. There might also be testpypi, which requires different rules and secrets
  2. it's a clearly separate publishing target
  3. there might be other targets like GH releases, they will likely do semantically different things.
  4. The name release (that I also used to use in the past) is too broad and would match any of these separate jobs so it doesn't help to accurately identify its purpose.

I like that idea.