openSUSE / open-build-service

Build and distribute Linux packages from sources in an automatic, consistent and reproducible way #obs
https://openbuildservice.org
GNU General Public License v2.0
936 stars 440 forks source link

[production][project#destroy] Mysql2::Error: Duplicate entry '571998-328490' for key 'parent_repository_index': UPDATE `path_el... #5733

Open dmarcoux opened 6 years ago

dmarcoux commented 6 years ago

project#destroy threw this 0 times

Mysql2::Error: Duplicate entry '571998-328490' for key 'parent_repository_index': UPDATE `path_elements` SET `repository_id` = 328490 WHERE `path_elements`.`id` = 2938731

https://build.opensuse.org/project/destroy"

~See this exception on Errbit~

~See this exception on Errbit~

See this exception on Errbit

dmarcoux commented 6 years ago

There is already a validation for this in the PathElement model... I tried to reproduce the error and I couldn't. I'm leaving this issue here if someone has an idea on how to reproduce this.

coolo commented 6 years ago

Just as your other extra validations will help only so much. You need to lock the objects affected - OBS is underdoing this by far due to the fear of deadlocks (which can be a real pain).

http://thelazylog.com/understanding-locking-in-rails-activerecord/ is a rather brief introduction (just the 3rd google link I found, I'm sure there are better). In brief what's happening here:

Destroying a project is a lengthy process, which rewrites repositories of linked projects, destroys packages, does all this on the backend, and ... people get impatient and retrigger the delete action. As the first destroy already changed some bits, the second destroy finds the database in limbo.

Using DB transactions is underused in OBS just as much as DB locks - so you get the 2nd destroy throwing exceptions left and right you won't be able to reproduce without harming your database purposefully.

As DB transactions will only move the problem to the backend being in limbo, I'd suggest using more DB locks for such long running actions/methods. This at least you can verify to work in regular test cases.

hennevogel commented 6 years ago

... people get impatient and retrigger the delete action

That's why you disable the button, print some 'Deleting. Please wait..." flash, schedule a background job and 'hide' the object by some other mechanism. Yadda, yadda, yadda. This is a UX problem, not a technical one.

Long running database transactions are a solution to this problem that in the end will do more harm than good as they are not meant to run indefinitely. And they are especially problematic in situations where you don't really have an overview of what can produce deadlocks etc.

coolo commented 6 years ago

Well, I'm pretty sure you can trigger similiar mess by deleting the project through the API - getting a timeout and starting the osc command again.