Closed efriis closed 3 weeks ago
FYI reusable workflows are still not properly supported on the PyPI side. Some things happen to work almost by accident. There's a tracking issue somewhere in the Warehouse repo discussing this.
cc @woodruffw
This worked for me when pubslishing for test PyPI, but I got the same error when the GHA tries to publish to real PyPI.
Test PyPI:
Checking dist/svgcheck-0.9.0.tar.gz: PASSED
Notice: Generating and uploading digital attestations
Fulcio client using URL: https://fulcio.sigstore.dev/
TUF metadata: /root/.local/share/sigstore-python/tuf/https%3A%2F%2Ftuf-repo-cdn.sigstore.dev
TUF targets cache: /root/.cache/sigstore-python/tuf/https%3A%2F%2Ftuf-repo-cdn.sigstore.dev
Found and verified trusted root
Generating ephemeral keys...
Requesting ephemeral certificate...
Retrieving signed certificate...
Found <Name(O=sigstore.dev,CN=sigstore-intermediate)> as issuer, verifying if it is a ca
attempting to verify SCT with key ID dd3d306ac6c7113263191e1c99673702a24a5eb8de3cadff878a72802f29ee8e
Successfully verified SCT...
PyPI:
Checking dist/svgcheck-0.9.0.tar.gz: PASSED
Checking dist/svgcheck-0.9.0.tar.gz.publish.attestation: ERROR InvalidDistribution: Unknown distribution format:
'svgcheck-0.9.0.tar.gz.publish.attestation'
Thanks for the report. It would be useful to have a link to the log. And I even more useful if the job is restarted in debug mode. Meanwhile, we'll have to wait for @woodruffw to see this and take a look from his side.
This is the link to the failed job: https://github.com/ietf-tools/svgcheck/actions/runs/11604651979/job/32313751640
Thanks! I wonder where this is coming from..
Error in sitecustomize; set PYTHONVERBOSE for traceback:
ModuleNotFoundError: No module named 'coverage'
Error in sitecustomize; set PYTHONVERBOSE for traceback:
ModuleNotFoundError: No module named 'coverage'
Extracting the artifacts from that job for history (they will be garbage-collected in a while) and further inspection:
Thanks! I wonder where this is coming from..
Error in sitecustomize; set PYTHONVERBOSE for traceback: ModuleNotFoundError: No module named 'coverage' Error in sitecustomize; set PYTHONVERBOSE for traceback: ModuleNotFoundError: No module named 'coverage'
Looks that's from a left over old test file, It's unrelated to this issue.
Thanks for the details @kesara! This one is weird: that error is coming from twine
itself, not from PyPI.
From a quick look, I think what happened here is that I forgot to add a skip for attestations in twine check
. I can root cause that a bit more in a moment.
Edit: Yep, that's what it looks like. I'll work on a fix now.
https://github.com/pypa/twine/pull/1172 should fix the behavior observed by @kesara!
For @efriis: @webknjaz is right that reusable workflow support is currently unfortunately limited on PyPI. My colleague @facutuesca and I are currently working on improving that, but in the mean time most users should stick to non-reusable workflows for Trusted Publishing.
Got it. Would the recommendation be to combine the release and test release workflows just into a single one? I believe we separated them to make sure the test release step could never publish to pypi.org. When we registered our publishers, we have the test.pypi entries locked to _test_release.yml, and pypi entries locked to _release.yml.
appreciate the quick look @woodruffw !
Got it. Would the recommendation be to combine the release and test release workflows just into a single one? I believe we separated them to make sure the test release step could never publish to pypi.org. When we registered our publishers, we have the test.pypi entries locked to _test_release.yml, and pypi entries locked to _release.yml.
@efriis I think you can keep them in separate files, and make _release.yml
depend on _test_release.yml
by using the workflow_run
event. That way, _test_release
is no longer called by _release
, but rather _release
runs after _test_release
is successful, which means _test_release
is no longer used as a reusable workflow.
That way you should be able to keep the scope for the PyPI Trusted Publisher to just _release.yml
, and still be able to run the test workflow before it.
There's a tracking issue somewhere in the Warehouse repo discussing this
For visibility: https://github.com/pypi/warehouse/issues/11096
Got it. Would the recommendation be to combine the release and test release workflows just into a single one? I believe we separated them to make sure the test release step could never publish to pypi.org. When we registered our publishers, we have the test.pypi entries locked to _test_release.yml, and pypi entries locked to _release.yml.
@efriis you can achieve this by naming the environments pypi
and testpypi
, which is something I strongly recommend.
Thanks! I wonder where this is coming from..
Error in sitecustomize; set PYTHONVERBOSE for traceback: ModuleNotFoundError: No module named 'coverage' Error in sitecustomize; set PYTHONVERBOSE for traceback: ModuleNotFoundError: No module named 'coverage'
Looks that's from a left over old test file, It's unrelated to this issue.
@kesara I'm confused since it appears under the step running this action. And it's running in a container. Are you saying that you're leaving something on disk that ends up in the container as well?
Oh, wait… You're running the build within the same job as publishing. Giving OIDC privileges to your build tool chain is insecure and may lead to various impersonation / privilege elevation scenarios. Using trusted publishing this way is discouraged.
FWIW, this suggests that you may be leaving various incompatible artifacts in the dist/
folder, which is known to break twine in the past. So it's kinda related.
Try following the guide closely, only keeping the download step and the upload step in the publishing job: https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/#publishing-the-distribution-to-pypi. Perhaps, using a clean environment would help you avoid the issue of illegal files in dist/
breaking things.
@kesara if what William suggests is correct, https://github.com/marketplace/actions/pypi-publish#disabling-metadata-verification might be a workaround too.
@woodruffw so is this report caused only caused by reusable workflows? Can you confirm that the logs are consistent with this explanation, so I could go ahead and close the issue?
@webknjaz Thanks. I'll review publishing workflow.
@woodruffw so is this report caused only caused by reusable workflows? Can you confirm that the logs are consistent with this explanation, so I could go ahead and close the issue?
Yep, that's correct -- what's happening here is that attestations against the "build config" identity, which is the top-level initiating workflow. In non-reusable cases that might be something like release-event -> release.yml
, but in the reusable workflow case it might be something like release-event -> release.yml -> pypi-publish.yml
, where the attestation expects release.yml
but the user has pypi-publish.yml
configured as their trusted publisher.
@kesara so I looked into your workflow once again (from laptop this time, mobile view can be limiting…) and realized what was happening, combined with what's been said about Twine earlier:
1) One invocation of the action runs, does the metadata check and then produces the attestation files on disk, followed by uploading everything.
2) The following invocation is pointed to the same directory on disk, which now contains both pre-existing attestations and running twine check
crashes on those files because of a bug in twine (the bugfix is yet to be released), never getting to signing the dists and uploading them. We pass dist/*
by default, which is how Twine locates said files.
@woodruffw actually added a check that makes the action fail if attestations already exist on disk prior to invoking the action: https://github.com/pypa/gh-action-pypi-publish/blob/fb13cb3/attestations.py#L76. But that check is happening after twine check
, so it's a bit too late. This is why you're seeing an underlying twine error instead of a clearer error with the explanation.
@kesara if you were to disable the metadata check (https://github.com/marketplace/actions/pypi-publish#disabling-metadata-verification), you would see that error that we intended to display.
As I mentioned earlier (https://github.com/pypa/gh-action-pypi-publish/issues/283#issuecomment-2451044988), it's highly discouraged to build and publish within the same workflow, when using Trusted Publishing (due to making yourself vulnerable through OIDC-related privilege elevation). For this reason, no examples demonstrate having these in the same job. And in my projects, I always have separate jobs for different deployment targets, having different environment:
values — pypi
and testpypi
respectively.
To summarize, you're running the action in a way that was never considered supported (and no CI tests exist to verify it). So it happened to work somehow, but that was never intended to be acceptable. I saw that you attempted to have a work around this with more conditionals (https://github.com/ietf-tools/svgcheck/pull/66). While this will get you though the immediate problem, I still recommend refactoring to move uploading into separate jobs.
@woodruffw should we consider moving the attestations check earlier in the process? Also, perhaps we could produce a dist/.gh-publish-invoked-once
file and fail/warn on any invocations that follow the first one? WDYT?
@efriis since we established that the underlying issue is due to the reusable workflows being invoked, I'm going to close this as there's nothing for us to do here. There is another issue #166 that already tracks this in this repo (not that it's needed, since the updates necessary are required on the Warehouse side, not in this action).
Still, I'm pinning the issue for visibility. Thanks again for the report!
no problem - appreciate the response @webknjaz !
Not sure if there's anything you can do on the PEP 541 support side, but if there's anyone available to escalate this to that would be much appreciated 🙏
Not really — there's a long backlog and everybody wants to be escalated. It may take a long time to get your request reviewed. In general, we're trying to emphasize that the PSF / PyPA needs funding to hire more people to go through that queue. I think one position has been funded for a year and that person is speeding up the process, but that's probably not enough really. Ask your employer to fund another PyPI support person :)
I'm also getting the same InvalidDistribution: Unknown distribution format
trying to upload the attestation to PyPI (but no such error to TestPyPI). https://github.com/data-apis/array-api-strict/actions/runs/11713859145/job/32627476623
I don't think I'm using a reusable workflow (that means it has on: workflow_call
in it, right?). https://github.com/data-apis/array-api-strict/blob/main/.github/workflows/publish-package.yml
@asmeurer The workflow is running the gh-action-pypi-publish
action twice in the same job, so the second run is failing due to finding the .publish.attestation
files generated by the first run:
Checking dist/array_api_strict-2.1.2-py3-none-any.whl.publish.attestation: ERROR InvalidDistribution: Unknown distribution format:
'array_api_strict-2.1.2-py3-none-any.whl.publish.attestation'
Is this something that can be fixed upstream in this action? I'm not really changing the default way this action works, just uploading to TestPyPI first, which I imagine should be a rather common thing to do.
@asmeurer the error you're seeing is in Twine. You'll keep hitting the exception for as long as you have verification enabled in the action. Twine accepted the PR but we don't know when it'll go live.
If you disable this check, however, you'll be hitting our own check that disallows having attestation files on disk prior to running it. This one is by design as having illegal/untrusted files like that indicates misconfiguration.
Additionally, I've always emphasized that having separate jobs is best. You could then have properly named environment:
entries (since they are per-job) — pypi
and testpypi
. This is more accurate than putting a verb there since these represent deployment targets and not processes.
FWIW, sequential action runs were never tested, nor considered a supported use-case. I'm usually seeing this where discouraged job structure is used and so I'd rather document that this is not something supported.
I'm a little reluctant to go around trying to refactor my publishing GitHub Actions file. What I have works (or at least used to). I'm not an expert on GitHub Actions (someone else wrote this file initially), and this one in particular is annoying to modify because you can't really test that it works without doing a release, which itself is tough because PyPI doesn't let you undo a release. Plus so many steps have to be done manually by typing stuff into a form on pypi.org or clicking around in the GitHub settings. If you want to make a PR to my project doing whatever fix you are suggesting, go for it.
Otherwise, I'm most likely going to just permanently disable either TestPyPI uploading or attestations in the job, since those seem to be the cause of the issue and that should fix the issue going forward, as I understand it.
@asmeurer yes, I can relate to the sentiment of being difficult to test. This was the reason it took me half a year to merge that big refactoring PR. And still, we ended up finding all these corner cases somebody on the internet apparently relied on. I did my best to address what I could but we can't go back now. As for disabling attestations, I think if you disable them in one of the invocations, that'll also work for you.
Just had this fail in Blinker (and if I had upgraded a bit sooner, it would have failed in Werkzeug and Flask too). It's not directly related to the title of this issue, but further down it was indicated that uploading to test then pypi in the same workflow results in the error as well. Here's the workflow run: https://github.com/pallets-eco/blinker/actions/runs/11746299115/job/32725793081?pr=176 It looks like it's been fixed in twine and we're just waiting for a release, and I'm going to try removing the test upload and re-running in the mean time. Nothing urgent, just wanted to add some extra data.
Thanks @davidism! Yeah, this should be fixed in twine
on main
, and just needs a new release. I can ask to see if they'd be willing to make one.
@woodruffw that wouldn't be enough since you added an explicit check for this in our action to also fail on pre-existing attestations on disk.
@davidism in general, I recommend having separate jobs. But an immediate workaround would be to add a step between the action removing the attestation files that the first step produced.
in general, I recommend having separate jobs. But an immediate workaround would be to add a step between the action removing the attestation files that the first step produced.
In the case of using the action twice in the same job – will this workaround still be needed once there is a new release of twine? (>5.1.0)
According to the previous comment yes because there's also a check in this workflow itself.
It would be great if you could fix the workflow so that this just works here, even if it isn't recommended. I have a lot of release workflows that upload to testpypi then upload to real pypi that used to work just fine and it would be nice if they could continue to work.
I agree with @asmeurer. If a workflow going to break things that used to work, major version update should have been better.
Maybe if we'd known before. But that ship has sailed. This wasn't a public API change, so SemVer expectations don't apply. You can clean the artifacts between the runs if you want, but that's the extent of it. The validation exists to catch a security condition of accidentally leaving incorrect artifacts on disk produced by something outside the action invocation. And it did its job — it found misconfiguration. So it works as one would expect. The only action item I see is documenting this, as it's probably not very obvious.
I'm not following how the ship has sailed, especially if it's not a public API change.
I will say that if your goal here is to push attestations as a "good thing" for the ecosystem, then this problem goes against that goal, because from a high level, the appearance here is that things worked before attestations and they broke after attestations. It's not a great feeling or a great look. Telling people that they now need to rewrite their complex workflows that used to work "just because" isn't exactly meeting people where they are.
I freely admit I don't understand how attestations work, and your description doesn't even make sense to me (how does something get "left on disk" in a GitHub action? Aren't github actions runs self contained?). But that's the point. I'm sure most people are like me in that their eyes glaze over when faced with technical descriptions of why their GitHub Actions started breaking for no reason. It just feels like things used to work and now they don't, and it's clearly because of a change made in this repo, because they stopped working when dependabot updated this action.
I freely admit I don't understand how attestations work, and your description doesn't even make sense to me (how does something get "left on disk" in a GitHub action?
FWIW, this is because the steps within a single job on GitHub actions all share the same filesystem (and all other machine state). The filesystem only disappears in between jobs, because each job gets scheduled on its own VM.
I think @webknjaz has a fair point about public API surface here (and I'll note that the change here isn't intrinsically related to attestations; it's just the first feature to surface it). At the same time, I think it's unfortunate that we caused any degree of public workflow breakage here.
In the short term, you can work around this by setting attestations: false
on the TestPyPI step -- attestations don't really matter for TestPyPI anyways, and that'll prevent the file conflict. I'll think some more about a longer-term solution tonight (I have ideas around enforcing domain separation between PyPI and TestPyPI artifacts that would potentially allow us to handle these kinds of cases).
(timeline race condition note: I started typing this in before William's comment but sent later)
@asmeurer I hear you! Yes, it's frustrating. Ironically, the post you linked was written by the very same person who added this check...
I think it's fair to want things to be explained more simply and clearly. Although, from the position of a maintainer, I feel like there's a lot of pressure to “get things right”, which turns into an unreasonably long list of feature ~requests~ demands; especially for stuff that should've never worked in the first place / worked by an unhappy accident. This is why I'm so protective of my world-writable TODO-list.
The official guide has existed from the very first day of this action's release: https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/. It changed with the action's requirements and reflected what needed to be done as new ways appeared. William also made an effort to update GitHub's docs (https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-pypi#updating-your-github-actions-workflow) and starter workflows (https://github.com/actions/starter-workflows/blob/main/ci/python-publish.yml), which is a cumbersome process, so that people wouldn't get incorrect recommendations from GH itself.
I understand that maybe the guide wasn't explaining the necessity to have job separation at first, but that improved over time. There's been discussions / issues in-repo about this for so long that I was under the impression that the guide mentioned this too...
I guess my point is that many people chose not to follow the documented way of use, and I don't feel like they should be in position to request that things work a certain way if they opted out of the supported method.
As for the possible improvement, perhaps we could turn it into a big fat warning for some while and delete the attestations from disk instead. But give some deadline for people to migrate. That is, if @woodruffw would say it's secure enough.
I freely admit I don't understand how attestations work, and your description doesn't even make sense to me (how does something get "left on disk" in a GitHub action?
Addressing this separately: we check if there's no attestations in the dist folder, we call pypi_attestations
to do the signing magic, which then puts attestation files on disk, right next to the dists, into the same folder. And then, we upload all that.
So when you call the action for the second time, the attestations are present from the previous run and that first sanity check fails. I think William thought that this would indicate that the end-users attempted to create these files by hand somehow, in ways that wouldn't work.
Aren't github actions runs self contained?
To an extent. Most of the actions have a ton of side effects. I initially chose the Docker-based action type to reduce the number of side effects. So it wouldn't normally affect your Python install / virtualenv. But still, the workdir is mounted into the container, allowing the action to read the dists you're feeding into it and put more “output” files in there.
As for the possible improvement, perhaps we could turn it into a big fat warning for some while and delete the attestations from disk instead. But give some deadline for people to migrate. That is, if @woodruffw would say it's secure enough.
I was discussing this with @facutuesca, who reached a similar conclusion as a workaround. I think this is fine from a security perspective (since the action itself is responsible for producing attestations), but it could be a source of user confusion in terms of why files are disappearing/being clobbered.
So, I'd be okay with this as a warning + a fixed disablement timeline, as long as there's consensus that this will minimize any user confusion and doesn't make it harder than necessary for people to consume attestations 🙂. I think that latter part should be fine, since we're planning on stuffing the attestations inside of output variables anyways.
I think William thought that this would indicate that the end-users attempted to create these files by hand somehow, in ways that wouldn't work.
Yep, that was the majority of the rationale! The other rationale is that even if it would work, having a conflict between attestations indicates a confused state somewhere (as it does in this case!), and that we'd be better off failing and trying to understand it rather than silently letting it through.
That's what ended up happening here and now we understand why it's happening, so I'm OK with tweaking the behavior now that it's a "known known" instead of a "known unknown."
@woodruffw do we want to keep producing the artifacts as outputs in the dist directory, provided we don't currently set any outputs? If not, we could even go as far as writing them into a temporary directory instead of the workdir and then passing that to twine upload
so they wouldn't leak into the host filesystem. That's what I mentioned to @facutuesca like 15 minutes ago on IRC... I just didn't remember if we discussed files being “promised outputs” explicitly.
I think we should make said warning very verbose and outline very specific steps to take to “migrate”. This would tie nicely with https://github.com/pypa/gh-action-pypi-publish/pull/305#issuecomment-2499315017, I imagine.
So I just got bitten in the ass by this bug as well, and to quote myself (even after having read about attestations a while ago), my rant to my coworker was literally "what was this attestation BS again that just broke my release".
Would have been nice, if twine had had a release containing the fix for this...
TL;DR: Attestations are probably a useful feature, but bugs like this just make things annoying for everyone. Especially since fixing those require cleaning up and force-pushing the tag, and likely even enabling skipping of existing releases for the test pypi publish job since that one succeeded of course...
This project doesn't control twine
's release schedule.
TL;DR: Attestations are probably a useful feature, but bugs like this just make things annoying for everyone.
I understand your frustration, and I'm sorry that this bug bit you. At the same time, I don't think it's true that this makes things "annoying for everything": public upload statistics indicate that >99% of workflows were upgraded seamlessly.
Of the workflows that are experiencing problems, the majority are of a shape that was already known to be problematic (reusable workflows), and a minority are of a new shape (stacking publish steps on top of each other). It's worth noting that the top-level issue here is for the majority case, not the case you're experiencing.
We're going to improve the failure mode here as well (I'm going to be working on it in the coming days), but I don't think there's much value in overstating the case.
Howdy! We just had to turn off release attestations to get our releases running in langchain-ai/langchain. Haven't had a chance to dig deeper into attestation configuration in order to see what we need to fix, and thought I'd file an issue in case others run into the same thing!
https://github.com/langchain-ai/langchain/pull/27765
Release Workflow
We run releases from the two workflow files edited in ^ that PR
Error
We were seeing errors in your releases, e.g. in this workflow run: https://github.com/langchain-ai/langchain/actions/runs/11602468120/job/32307568692
Configuration of test release - 2 main things that look weird are
/legacy/
andrepository_url
(we configurerepository-url
per docs)Logs - partially redacted
Temporary Fix
https://github.com/langchain-ai/langchain/pull/27765
We turned off attestations with
attestations: false