purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
95 stars 80 forks source link

Don't fail publishing when registry repos are behind the upstream #646

Open thomashoneyman opened 1 year ago

thomashoneyman commented 1 year ago

In https://github.com/purescript/registry/issues/336#issuecomment-1660974501 a package failed to commit and push metadata because the local checkout of the registry repo was behind the upstream by one commit. I don't think this should happen — we should be able to pull to re-sync and then try to push again.

The error in this case arose from the implementation of WriteMetadata:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/Effect/Registry.purs#L250-L271

...but it applies everywhere we use writeCommitPush, which includes writing metadata, manifests, package sets, and other registry data to the various registry repositories. There's a narrow window to get behind the upstream — you'd have to get behind in the time between the pull but before the push in the implementation linked below — but it's certainly possible:

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/Effect/Registry.purs#L678-L693

To drill down even further, the specific place where this error arises is in the call to Git.gitCommit. If we are behind the upstream then we bail out of the commit.

https://github.com/purescript/registry-dev/blob/8177cd30d7b7cf3379b8b728564def0ed438298c/app/src/App/CLI/Git.purs#L181-L200

It's OK that we bail out, but we shouldn't bail out of the entire pipeline. Instead, we could bail out of the commit, pull again (with --autostash), and then try to commit again. Only on some n number of failures to resync with the upstream would it be reasonable to fully terminate the process.

f-f commented 1 year ago

Is this really a problem? I don't think we gain anything in allowing multiple jobs to run concurrently, and we should just limit the amount of running jobs to a single one

thomashoneyman commented 1 year ago

It feels odd to arbitrarily limit jobs such that if someone else is uploading a package then I cannot publish mine. Isn’t it preferable to allow concurrent package uploads?

Certainly concurrent jobs are possible so long as the GitHub pipelines exist. We will have to rewrite the GitHub module to send requests to the server instead of executing anything in the action itself if we want control. (I know eventually we want to disable the GitHub API altogether.)

If we don’t want any concurrent operations (anything that touches a shared git resource, at least) then that would mean that the build matrix, daily importer, etc cannot run if a package is currently being published and vice versa. Isn’t that problematic?

thomashoneyman commented 1 year ago

Here's another example of this happening:

https://github.com/purescript/registry/issues/344 was terminated because https://github.com/purescript/registry/issues/345 modified the metadata during its run.