Vitess recently got UPDATE with JOIN support, but there may be some inefficiency/deficiency. It seems to resort to OperatorType DMLWithInput (which rewrites to SELECT ... FOR UPDATE; UPDATE ...) when it doesn't have to. This is not only inefficient, but also, when does that rewrite, it causes deadlocks if you have multiple connections doing it an the same time (in transactions). I ran into this.
Expected result: when using fields from joined tables from the same shard/vindex in the SET expression, it can still use OperatorType Update.
Reproduction Steps
As an example that is treated well: I use hard-coded values in SET, it uses a normal update:
vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = 2
WHERE lld.valueEnum != 1
AND al.idSite = 98478 \G
{
"OperatorType": "Update",
"Variant": "EqualUnique",
"Keyspace": {
"Name": "sites2024",
"Sharded": true
},
"TargetTabletType": "PRIMARY",
"Query": "update alarmLog as al, lastLogData as lld set al.valueEnum = 2 where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite",
"Table": "alarmLog",
"Values": [
"98478"
],
"Vindex": "a_standard_hash"
}
But when I take the value to set from the JOIN, it uses DMLWithInput:
vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = lld.valueEnum
WHERE lld.valueEnum != 1
AND al.idSite = 98478
\G
{
"OperatorType": "DMLWithInput",
"TargetTabletType": "PRIMARY",
"BindVars": [
"0:[lld_valueEnum:1]"
],
"Offset": [
"0:[0]"
],
"Inputs": [
{
"OperatorType": "Route",
"Variant": "EqualUnique",
"Keyspace": {
"Name": "sites2024",
"Sharded": true
},
"FieldQuery": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where 1 != 1",
"Query": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite for update",
"Table": "alarmLog, lastLogData",
"Values": [
"98478"
],
"Vindex": "a_standard_hash"
},
{
"OperatorType": "Update",
"Variant": "Scatter",
"Keyspace": {
"Name": "sites2024",
"Sharded": true
},
"TargetTabletType": "PRIMARY",
"Query": "update alarmLog as al set al.valueEnum = :lld_valueEnum where al.idAlarm in ::dml_vals",
"Table": "alarmLog"
}
]
}
vtgate version Version: 20.0.2 (Git revision 2592c5932b3036647868299b6df76f8ef28dfbc8 branch 'HEAD') built on Wed Sep 11 08:15:20 UTC 2024 by runner@fv-az1152-369 using go1.22.7 linux/amd64
Overview of the Issue
Vitess recently got UPDATE with JOIN support, but there may be some inefficiency/deficiency. It seems to resort to OperatorType
DMLWithInput
(which rewrites toSELECT ... FOR UPDATE; UPDATE ...
) when it doesn't have to. This is not only inefficient, but also, when does that rewrite, it causes deadlocks if you have multiple connections doing it an the same time (in transactions). I ran into this.Expected result: when using fields from joined tables from the same shard/vindex in the
SET
expression, it can still use OperatorTypeUpdate
.Reproduction Steps
As an example that is treated well: I use hard-coded values in
SET
, it uses a normal update:But when I take the value to set from the
JOIN
, it usesDMLWithInput
:The relevant table entries in the VSchema:
And:
Binary Version
Operating System and Environment details
Log Fragments
No response