Closed koivunej closed 6 months ago
Before https://github.com/neondatabase/neon/pull/6139, the uploads were awaited, guaranteeing that the original operation would have been completed.
That part didn't change. If we receive a creation request and the timeline already exists, we still wait for its remote uploads.
The only change is in the case that the uninitialized guard exists: we used to return 500 (which was a bug), and now we return 409.
In general, I don't consider it a bug that we don't coalesce identical repeated requests. As long as repeated requests eventually succeed, that's okay. We can tweak the response code if necessary, although I don't want to use 503 for this: 503 is used for "I can't handle this request right now", but this situation is more like "You asked me to do two things at the same time that can't happen at the same time"
I agree, but I failed to see this last week; the behavior is unchanged for a very short period of locking.
We can make this a bit nicer for clients by distinguishing true conflicts from creations in progress, by reporting the latter as 429:
Before #6139, two identical timeline creation requests started in a retrying-until-complete situation:
Before #6139, the uploads were awaited, guaranteeing that the original operation would have been completed.
Currently, the 2nd request is responded to with 409 immediately instead of waiting for the operation to be completed.
The fix would be to wait for the operation's completion before returning at all.
Context: https://neondb.slack.com/archives/C06K38EB05D/p1711114590911899?thread_ts=1711093964.514849&cid=C06K38EB05D
Returning of AlreadyCreating: https://github.com/neondatabase/neon/blob/62b318c928f365827039022e900bd6c80928792e/pageserver/src/tenant.rs#L1443-L1446
AlreadyCreating is converted to 409: https://github.com/neondatabase/neon/blob/77f3a30440aba4845da3a5203a2764fed4d96648/pageserver/src/http/routes.rs#L538-L541