Open mkllnk opened 8 months ago
This issue has been linked to an error in Bugsnag ActiveRecord::Deadlocked in api/v0/variants#create
This is the other half of the deadlock:
# app/models/enterprise.rb:586
def touch_distributors
Enterprise.distributing_products(supplied_products.select(:id)).
where.not(enterprises: { id: }).
update_all(updated_at: Time.zone.now)
end
Hello @mkllnk ,
I would like to help, even though I don't know if I can reproduce and work locally (I have no n8n & bugsnag account). Is it possible ?
Yes, you can help. The error appears when two API requests try to create a variant for the same product at the same time.
If you want to reproduce it properly, you can write a spec that sets a breakpoint and starts two threads. Here's an example: https://github.com/openfoodfoundation/openfoodnetwork/blob/af9f07f496052ff77b033bd8e6e49e8803e89f0f/spec/jobs/subscription_placement_job_spec.rb#L183-L227
Hello @mkllnk ,
I have tried to reproduce it but did not succeed.
If I launch 1500 threads of a POST on v0/variant#create, I do have an error, but not this one :)
If I launch 1200 threads, the create succeeds, but I do have some missing variants though.
And if I have in_batches in app/models/enterprise.rb
in the touch_distributors
method, I do get all my 1200 variants (though I don't know why some are missing if I don't batche, no error raised).
But I cannot replicate the deadlock. I don't understand how I can use a mutex to have 2 lines trying to get written at the exact same time in the DB and raise a Deadlock. (If I remove the mutexes from your example, test still passes).
From what I have read on stackoverflow, with an error like that, in_batches could protect from the deadlock.
Did you set the flag concurrency: true
on the spec? Without that flag, RSpec runs every test example in a transaction and since the transaction is rolled back at the end of the spec, the data is never committed to the database and conflicts never appear. With that flag, we deactivate the transactions and the threads can run into race conditions. The trick is for both threads to pause within their own transaction and then release the lock.
In my example, modifying the spec doesn't necessarily make it fail. You need to change the code to make it fail:
diff --git a/app/services/place_proxy_order.rb b/app/services/place_proxy_order.rb
index aaa88f620d..fabcefa83b 100644
--- a/app/services/place_proxy_order.rb
+++ b/app/services/place_proxy_order.rb
@@ -35,11 +35,11 @@ class PlaceProxyOrder
attr_accessor :order
def place_order
- proxy_order.with_lock do
+ #proxy_order.with_lock do
return if proxy_order.placed_at.present?
initialise_order
- end
+ #end
end
def initialise_order
This spec just checks that only one order is placed though. It doesn't test for a deadlock. That would appear if you have many proxy orders and if they got loaded in a random order. The current code contains order(:id)
though to guarantee deterministic execution and prevent deadlocks.
Hello @mkllnk , thanks for the tip, I understand the why of your code.
But here, I have no clue of how to get to a race condition. A creation of a variant touches a product which touches an Enterprise. All this with callbacks. I guess these operations must be synchronized and at one moment it is not the case in production. But I don't know what to do with the final update_all in Enterprise. I don't know how to spot an inaccuracy. Would there be some more info in logs ?
Sorry for the long delay on this. We don't have any more information in the logs. I do know that an update_all
can be problematic though. If you don't apply any ordering to the records and you update, for example, records 1, 2 and 3 then one update call may tray to update 1, 3, 2 and another may update 3, 1, 2. If that happens at the same time then one process locks row 1 while the other locks row 3. Then the first process wants to lock row 3 but needs to wait for the other process, already locked. The other process wants to lock row 1 but that's locked by the first process and we get a deadlock.
This can be resolved by first locking the rows in order. I don't know if you can apply order by
in an update call, I guess not. So you first need to do a select
with a lock for writing and then you can update.
Description
Trying to create multiple variants simultaneously can fail. When importing and updating 800 products through n8n, it always failed at some point.
Expected Behavior
All requests succeed without failure.
Actual Behaviour
Deadlock error described below.
Steps to Reproduce
Animated Gif/Screenshot
Workaround
Send requests in sequential order and wait for the result before sending the next request. This workaround is limited because you can't control other people's requests. You have to be the only one uploading for one enterprise at a time.
Severity
Your Environment
Possible Fix
We can lock required records before the execution. Or we could rework our caching logic to not need as many touch events.
Error in OpenFoodNetwork Aus
ActiveRecord::Deadlocked in api/v0/variants#create PG::TRDeadlockDetected: ERROR: deadlock detected DETAIL: Process 6375 waits for ShareLock on transaction 35670285; blocked by process 6378. Process 6378 waits for ShareLock on transaction 35670288; blocked by process 890. Process 890 waits for ExclusiveLock on tuple (474,7) of relation 18092 of database 18001; blocked by process 6375. HINT: See server log for query details. CONTEXT: while updating tuple (474,7) in relation "enterprises"
View on Bugsnag
Stacktrace
View full stacktrace
Created by Maikel via Bugsnag