vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.19k stars 2.06k forks source link

Bug Report: vtgate panic after online migration #10642

Open L3o-pold opened 2 years ago

L3o-pold commented 2 years ago

Overview of the Issue

We encounter vtgate panic after running an online migration (adding a column). Even after vtgate reboot or downgrade (on 14 RC) the issue persisted. The migration revert throwed and error (not sure but I think it was something similar: ... unexpected plan type (CallerID: user)). The only way to recover those vtgate was to manually rollback the migration (DROP the column.)

Reproduction Steps

I tried to reproduce the error, but could not. Even on the impacted infrastructure, after some times the migration was a success. (Even if it failed in multiple attempts).

What I did was:

running the migration on one vtgate:

SET @@ddl_strategy='vitess';
ALTER TABLE files ADD COLUMN `extension` char(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL AFTER file_path;

After this, multiple (all) vtgate were crashing with the following. Live application was querying those vtgate, so the query that make vtgate to crash was not clear but we saw as a first error:

SQLSTATE[HY000] [2002] Connection refused (SQL: select * from files where name = chrisb limit 1)
panic: runtime error: index out of range [9] with length 9

goroutine 451 [running]:
vitess.io/vitess/go/sqltypes.MakeRowTrusted(...)
    vitess.io/vitess/go/sqltypes/result.go:255
vitess.io/vitess/go/sqltypes.proto3ToRows({0xc000dcbf80, 0x9, 0x0?}, {0xc00067d1b0, 0x1, 0x1d142a0?})
    vitess.io/vitess/go/sqltypes/proto3.go:92 +0x21b
vitess.io/vitess/go/sqltypes.Proto3ToResult(0xc000dcbf00)
    vitess.io/vitess/go/sqltypes/proto3.go:121 +0x4e
vitess.io/vitess/go/vt/vttablet/grpctabletconn.(*gRPCQueryClient).Execute(0xc00090ae80, {0x23d1370, 0xc0011edc80}, 0xc001382240, {0xc0011a2e40, 0x3d}, 0xc0009424e0, 0x0, 0x0, 0xc0008c7130)
    vitess.io/vitess/go/vt/vttablet/grpctabletconn/conn.go:120 +0x392
vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).Execute.func1({0x23d1370, 0xc0011edc80}, 0xc0011cb840?, {0x23e2de8?, 0xc00090ae80?})
    vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:185 +0x89
vitess.io/vitess/go/vt/vtgate.(*TabletGateway).withRetry(0xc000396310, {0x23d1370, 0xc0011edc80}, 0xc001382240, {0xc000f30ad8?, 0x40d827?}, {0x48?, 0x1d320c0?}, 0x0, 0xc00093ecd0)
    vitess.io/vitess/go/vt/vtgate/tabletgateway.go:329 +0x464
vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).Execute(0xc0005c62a0, {0x23d1370, 0xc0011edc80}, 0xf37c12?, {0xc0011a2e40, 0x3d}, 0xc0009424e0, 0x0, 0x0, 0xc0008c7130)
    vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:183 +0x191
vitess.io/vitess/go/vt/vtgate.(*ScatterConn).ExecuteMultiShard.func1(0xc0011fec60, 0x1, 0xc0011fb620)
    vitess.io/vitess/go/vt/vtgate/scatter_conn.go:214 +0x343
vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction.func1(0xc0011fec60, 0x6420fb66?)
    vitess.io/vitess/go/vt/vtgate/scatter_conn.go:602 +0x194
vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction.func2(0xc0007ad0e0?, 0x0?)
    vitess.io/vitess/go/vt/vtgate/scatter_conn.go:630 +0x5b
created by vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction
    vitess.io/vitess/go/vt/vtgate/scatter_conn.go:628 +0x215

Schema tracking was disabled, and running with or without updated column vschema was not resolving the issue.

Binary Version

Version: 14.0.0 (Git revision 4e6d95c08abaa5e2e47fc9243a52884e6ae829a6 branch 'heads/v14.0.0') built on Tue Jun 28 12:55:43 UTC 2022 by vitess@buildkitsandbox using go1.18.3 linux/amd64

Operating System and Environment details

docker

Log Fragments

No response

deepthi commented 2 years ago

Was this on a sharded cluster?

L3o-pold commented 2 years ago

Yes, a small 2k records table, sharded on 2 range (-80 80-) but datas were only on one shard.