Open thomashoneyman opened 1 year ago
The fix for this issue should include a test that verifies a failure in one part of the pipeline rolls back other changes such that we can re-issue the publish command and succeed. The tests for that are located here:
I am not convinced about rolling back storage uploads: we support multiple storage backends, so would we try to roll all of them back if the pipeline fails? That feels messy.
I wonder if we should instead ensure that the pipeline can never fail after we push to the storage backend(s).
I don’t see how we could ensure the pipeline never fails — we can do our best to avoid failure, but we’re dealing with a remote git server so there’s always some possibility of failure
Yeah I see. I guess my gut feel on this is that once we have "committed" to a package publish, then the only option is to move forward with is and deletion should not be an option. I am not quite sure what kind of infrastructure we'd need to ensure this, a few ideas:
A recent package upload failed to upload metadata to the registry after the tarball was uploaded to storage. The author retried the publishing workflow, but publishing failed because there was already a tarball in storage. See:
https://github.com/purescript/registry/issues/336#issuecomment-1660974501 https://github.com/purescript/registry/issues/336#issuecomment-1660983820
Specifically, in this section of
publishRegistry
, theStorage.upload
andRegistry.writeMetadata
functions can throw exceptions, and the process is aborted if so:https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/API.purs#L644-L647
We should probably catch the
Registry.writeMetadata
exception and roll back theStorage.upload
(ie. issue aStorage.delete
) before fully exiting the pipeline. Alternately, we could have a sort of bracket functionality built into the publish pipeline where we record what resources have been modified and on publish failure we roll back all of those modifications.