Closed inolddays closed 6 months ago
The better solution may be to use vschema, that includes the table column lists with authoritative set to true; then perform the DDL to add the column against the shards. When the DDL on all the shards are complete, update the vschema to add the new column. In this way, a select *
will not expand to additional columns until after the vschema update is done.
The better solution may be to use vschema, that includes the table column lists with authoritative set to true; then perform the DDL to add the column against the shards. When the DDL on all the shards are complete, update the vschema to add the new column. In this way, a
select *
will not expand to additional columns until after the vschema update is done.
this may can solve the problem. but one thing should care about , will this kind of realization cause too many changes compare with the former code structure ?
reproduce it on my labtop @sougou
when i add new column directly to first shard's master mysql.
then i exeute "select * from sbtest limit 1\G"
debug the vttablet:
fileds and row.length are not equal cause the query.plan.fields are used old one.
At first i add two pieces of code on vtgate like this:
but it will not work with the ddl like :
alter table sbtest1 add grade1 int(11) unsigned NOT NULL DEFAULT '0' COMMENT 'grade1' after _hd_update_region;
fields' order are not rightly be parsed then.
so i guess pr
https://github.com/vitessio/vitess/pull/5572
seems fix this problem but it bring another flag, and it may not easy for people to know when the right time to set the flag as false.
@aquarapid gives a perfect solution that i agree also. currently the temporary solution for me is to bring the flag "watch_replication_stream" back on vttablet. this flag will seeing ddl change on schema and reload schema instantly and clear the query plan, This will greatly reduce the likely occurrence of such a panic.(just in the condition when doing ddl through vtgate)
It's important to note that the tests above are all based on 4.0, but I've looked at and compared the latest master branch code and the same problem exists.
Chiming in. We should still fix the panic. If the field length is longer than the number of columns returned, maybe we can pad with nulls, or return an error. Maybe returning an error is better.
vtgate is a gateway cluster that supports multi-tenancy. Some of our gateways are used by hundreds of applications. If an application has a similar problem causing the gateway cluster to crash, hundreds of applications will be unavailable, This will be a very serious accident, I think The priority should be P0
This should only be an internal panic. I don't think vtgate actually crashes at this point.
This should only be an internal panic. I don't think vtgate actually crashes at this point.
vtgate will crashes. it depends on how long time vttablet will reload schema itself. Default value is 30 minutes. In other words, if all applications use the same vtgate cluster , vtgate wil not be available until 30 minutes later. During this time, when there is query like "select *", gates cluster will alway crashes
We have schema tracking at vtgate which should solve this As it does star
expansion for sharded queries.
we execute an alter table DDL statment(add two columns) throught vtgate on one keyspace which has 16 shards. And found the vtgate clusters are becoming crashes. After crashes for some minutes (almost near 30 minutes) All vtgates are available again Here is the call stack:
have been discussed with @sougou
this problem may because when we applying ddl on multiple shard and at the same time the business app runs select as well. vtgate send query to first shard and first shard says it 5 columns but the other shard return 4. vtgate then expects uniform number of columns. One solution is persuade user not use select and rewrite sql like :explicitly select a,b,c, then it won't fail . But some times it not that easy to make user change. so we need to add some protections on this situation. This does not fundamentally solve the problem but at least vtgate will not crash Here may another pr i paste it here to track if it is something could refer on or related : https://github.com/vitessio/vitess/issues/5572