vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.7k stars 2.1k forks source link

Feature Request: UPDATE with JOINs with all the same vindexes does not need OperatorType DMLWithInput #16876

Open wiebeytec opened 1 month ago

wiebeytec commented 1 month ago

Overview of the Issue

Vitess recently got UPDATE with JOIN support, but there may be some inefficiency/deficiency. It seems to resort to OperatorType DMLWithInput (which rewrites to SELECT ... FOR UPDATE; UPDATE ...) when it doesn't have to. This is not only inefficient, but also, when does that rewrite, it causes deadlocks if you have multiple connections doing it an the same time (in transactions). I ran into this.

Expected result: when using fields from joined tables from the same shard/vindex in the SET expression, it can still use OperatorType Update.

Reproduction Steps

As an example that is treated well: I use hard-coded values in SET, it uses a normal update:

vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = 2
WHERE lld.valueEnum != 1
AND al.idSite = 98478 \G
{                                                                                       
        "OperatorType": "Update",                                                             
        "Variant": "EqualUnique",                                                             
        "Keyspace": {                                                                         
                "Name": "sites2024",                                                                                  
                "Sharded": true                                                                                                                                                             
        },                                                                                                                                                                                  
        "TargetTabletType": "PRIMARY",                                                                                                                                                                                                      
        "Query": "update alarmLog as al, lastLogData as lld set al.valueEnum = 2 where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite",                                                                                
        "Table": "alarmLog",                                                                                          
        "Values": [                    
                "98478"                
        ],                                                                                    
        "Vindex": "a_standard_hash"                                                           
}

But when I take the value to set from the JOIN, it uses DMLWithInput:

vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = lld.valueEnum
WHERE lld.valueEnum != 1
AND al.idSite = 98478 
\G
{
        "OperatorType": "DMLWithInput",
        "TargetTabletType": "PRIMARY",
        "BindVars": [
                "0:[lld_valueEnum:1]"
        ],
        "Offset": [
                "0:[0]"
        ],
        "Inputs": [
                {
                        "OperatorType": "Route",
                        "Variant": "EqualUnique",
                        "Keyspace": {
                                "Name": "sites2024",
                                "Sharded": true
                        },
                        "FieldQuery": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where 1 != 1",
                        "Query": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite for update",
                        "Table": "alarmLog, lastLogData",
                        "Values": [
                                "98478"
                        ],
                        "Vindex": "a_standard_hash"
                },
                {
                        "OperatorType": "Update",
                        "Variant": "Scatter",
                        "Keyspace": {
                                "Name": "sites2024",
                                "Sharded": true
                        },
                        "TargetTabletType": "PRIMARY",
                        "Query": "update alarmLog as al set al.valueEnum = :lld_valueEnum where al.idAlarm in ::dml_vals",
                        "Table": "alarmLog"
                }
        ]
}

The relevant table entries in the VSchema:

"alarmLog": {         
      "column_vindexes": [
        {                    
          "column": "idSite",      
          "name": "a_standard_hash"
        }
      ],
      "auto_increment" : {
        "column": "idAlarm",
        "sequence": "alarmLog_seq"                                                                                    
      }
    },

And:

"lastLogData": {              
      "column_vindexes": [
        {
          "column": "idSite",
          "name": "a_standard_hash"
        }
      ]                      
    },

Binary Version

vtgate version Version: 20.0.2 (Git revision 2592c5932b3036647868299b6df76f8ef28dfbc8 branch 'HEAD') built on Wed Sep 11 08:15:20 UTC 2024 by runner@fv-az1152-369 using go1.22.7 linux/amd64

Operating System and Environment details

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"

Log Fragments

No response

harshit-gangal commented 1 month ago

This looks like a reasonable improvement. It does not seem like a bug but an enhancement.

GrahamCampbell commented 1 month ago

Is this a V20 issue only, or does this impact V19, too?