Agent Merging - OperationalError

carlosmbe commented 1 year ago

Describe the bug If you try to merge Agents with a remarkable amount of References.

It throws OperationalError at /api/specify/agent/replace/68/ (1205, 'Lock wait timeout exceeded; try restarting transaction')

To Reproduce Steps to reproduce the behavior:

Go to NHMD -> Amber -> Queries
Create a formatted query for Agents
Merge the four Agents that show up in the screenshot below
See error

The Agent names are:

Bendix-Almgreen, Soltau Bang, Svend Erik, Bente
Lange, Christian
Pape, Thomas
Selvantharan, Sree

Specify 7 Crash Report - 2023-07-18T21 11 50.025Z.txt

Screenshots Screen Shot that was here has been moved to SCC Vault for privacy reasons https://drive.google.com/file/d/1wONptMpw8xrt3_07Iw47sEIoi_aUyEyG/view?usp=share_link

Desktop:

OS: Mac Browser: Chrome Specify 7 Version: 7.9

grantfitzsimmons commented 1 year ago

The error Lock wait timeout exceeded; try restarting transaction means that a transaction has been waiting for a lock on a resource for longer than the configured timeout value.

This error commonly occurs in situations where multiple transactions are attempting to modify the same data concurrently.

One transaction may acquire a lock on the resource, and if another transaction tries to access or modify the same resource within the specified timeout period, it will wait for the lock to be released. If the waiting transaction exceeds the timeout value, the Lock wait timeout exceeded error is thrown. This timeout is due to the single large request being made, as in that instance Carlos is merging agents with over 150,000+ references. Since we changed agent merging to no longer work one-by-one but instead do it all at once, this is inevitable with a request that large. We need to see if there's a way to either split the request up or use the worker that the WorkBench uses.

This looks like something that Jason or Alec would need to resolve, otherwise we would need to increase the lock wait timeout limit in our MariaDB settings for hosted instances and write instructions for self-hosted ones to do the same

realVinayak commented 1 year ago

We do everything in one parent transaction so concurrent access is not the problem here. EDIT: while we do have child transactions (because of recursion) different rows are locked so concurrent access still shouldn't be a problem.

Using worker will not tie up the main django thread but locks will be issued by MySQL and InnoDB. So, using worker will not significantly "speed up" this process - because locks happen at database level. The current agent merging code spends majority of the time doing actual db transactions - which are limited by the speed of database itself.

It might still be a good idea to do that since it would make sure server is still accessible to other requests.

I'm looking into if there is a serious cause of this - other than just a lot of records being merged.

grantfitzsimmons commented 1 year ago

I encountered the same issue today.

https://github.com/specify/specify7/assets/37256050/00eff380-5de6-472b-b13a-9de959df6f0b

https://herbrbge71423-issue-3816.test.specifysystems.org/specify/overlay/merge/Agent/?records=23459%2C5

OperationalError at /api/specify/agent/replace/23459/ (1205, 'Lock wait timeout exceeded; try restarting transaction')

Specify 7 Crash Report - 2023-07-21T17 11 10.869Z.txt

realVinayak commented 1 year ago

I downloaded copy of the denmark database, and tried merging the same agents again. Couldn't reproduce this error. This most likely happened because someone was probably doing work at the same time you did the merge.

EDIT: It took 33 minutes to finish but didn't have a mysql timeout error

grantfitzsimmons commented 1 year ago

My instance isn't Denmark, and this has happened like 3 times without activity.

carlosmbe commented 1 year ago

@realVinayak You could probably recreate this error by merging the Agents in NHMD in batches of 30. Or I'll create and export a query just for you XoXo.

I downloaded copy of the denmark database, and tried merging the same agents again. Couldn't reproduce this error. This most likely happened because someone was probably doing work at the same time you did the merge.

EDIT: It took 33 minutes to finish but didn't have a mysql timeout error

grantfitzsimmons commented 1 year ago

https://github.com/specify/specify7/assets/37256050/6a7f7956-8b06-4b78-b36f-f01019ca44ba

https://herbrbge71423-issue-3816.test.specifysystems.org/specify/overlay/merge/Agent/?records=18018%2C113

Maybe I'm just running in circles. This just took place @realVinayak

Specify 7 Crash Report - 2023-07-21T19 43 43.928Z.txt

grantfitzsimmons commented 1 year ago

Moved to the worker. This error should be reported in the response if it happens.

specify / specify7

Agent Merging - OperationalError #3803