Roll back package uploads on metadata failure

thomashoneyman commented 1 year ago

A recent package upload failed to upload metadata to the registry after the tarball was uploaded to storage. The author retried the publishing workflow, but publishing failed because there was already a tarball in storage. See:

https://github.com/purescript/registry/issues/336#issuecomment-1660974501 https://github.com/purescript/registry/issues/336#issuecomment-1660983820

Specifically, in this section of publishRegistry, the Storage.upload and Registry.writeMetadata functions can throw exceptions, and the process is aborted if so:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/API.purs#L644-L647

We should probably catch the Registry.writeMetadata exception and roll back the Storage.upload (ie. issue a Storage.delete) before fully exiting the pipeline. Alternately, we could have a sort of bracket functionality built into the publish pipeline where we record what resources have been modified and on publish failure we roll back all of those modifications.

thomashoneyman commented 1 year ago

The fix for this issue should include a test that verifies a failure in one part of the pipeline rolls back other changes such that we can re-issue the publish command and succeed. The tests for that are located here:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/test/App/API.purs#L54-L58

f-f commented 1 year ago

I am not convinced about rolling back storage uploads: we support multiple storage backends, so would we try to roll all of them back if the pipeline fails? That feels messy.

I wonder if we should instead ensure that the pipeline can never fail after we push to the storage backend(s).

thomashoneyman commented 1 year ago

I don’t see how we could ensure the pipeline never fails — we can do our best to avoid failure, but we’re dealing with a remote git server so there’s always some possibility of failure

f-f commented 1 year ago

Yeah I see. I guess my gut feel on this is that once we have "committed" to a package publish, then the only option is to move forward with is and deletion should not be an option. I am not quite sure what kind of infrastructure we'd need to ensure this, a few ideas:

we could retry the git operations until they succeed
..or have a place in the database for storing "pending" operations that are going to be run in order - this would take care of our concerns with #646 as well

purescript / registry-dev

Roll back package uploads on metadata failure #645