vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.63k stars 2.1k forks source link

Allow maxrow validation to be a soft-limit #3017

Closed bbeaudreault closed 7 years ago

bbeaudreault commented 7 years ago

We're working on migrating more and more databases onto Vitess and one thing we run into often is the Row count exceeded exceptions. The only way for us to really test this is to flip vitess on, and then quickly flip it off if there's a problem. This is fine because we're still in QA. However I have some concerns for prod:

We'd like to add a queryserver option as a partner to queryserver-config-max-result-size: queryserver-config-enforce-max-result-size. The default would be true to keep vitess's usual operating state. When it's set to false, a warning would be logged similar to today's error log. This, along with https://github.com/youtube/vitess/issues/3016 would allow us to alert, diagnose, and fix any queries that are above the row limit.

Our goal is to enable enforcement of the maximum, but we need a way to get there safely considering the landscape of deployables and queries we are working with.

bbeaudreault commented 7 years ago

We'd be happy to do the work for this, mostly looking for comments from the youtube folks

alainjobart commented 7 years ago

That sounds useful to me. Sending to Sugu as he's more familiar with this.

sougou commented 7 years ago

Sounds good to me also. Some high level comments:

  1. It's better to increment a counter and export it through /debug/vars. Log messages are for humans when looking for more info on issues.
  2. The errors counter is tied to the canonical codes. So, we can't add anything to that. We'll need a brand new one, or find a better existing category.
  3. The max rows limit is a little spread out in the code base: plan building, query limit and row count checks after results are returned. So, the implementation may not be trivial. We can consider restructuring things a bit to keep all the related code in one place.

Another option is to build this as an independent feature: create a new flag: queryserver-config-rows-warning-level. This will keep the feature agnostic of issue 3. Also, you can use it to deploy newer constraints. For example, if you want to reduce the from 10k to 5k, you can set the warning at 5k. Once it's all clear, you can reduce the limit to 5k.

I'm beginning to like the second option more :).

bbeaudreault commented 7 years ago

I had thought about the second option, and agree it is alluring. I thought the first would be easier as a first pass. We can look into the second though, if that's preferred.

In terms of /debug/vars vs logs, I agree. I want the logs so that I can actually see the full query, and who made it. So a human would be reading that. I was planning on using the existing Results histogram in the /debug/vars to alert us to the need to check logs.

In terms of max rows code spread, even if the limit is used in multiple places, there is only 1 place where "Row count exceeded" is thrown. I also don't see many references to config.MaxResultSize or QueryEngine.maxResultSize.

harshit-gangal commented 7 years ago

There is already one flag to change max result size. But enforce flag will be useful.

sougou commented 7 years ago

The max rows limit is currently enforced as follows:

So, to change the max rows behavior to a warning, you need to write conditional code in the above three places. You'll also need to invent a 'very high, but safe constant' to pass to conn.Exec. But if you used the second route:

PS: I found a bug while inspecting the code. The UPDATE and DELETE subqueries don't enforce this limit, which should be fixed :).

bbeaudreault commented 7 years ago

Alright, sold. I did not think about needing to fix the LIMIT as well (duh). It will also be nice to be able to use it to transition limits. We'll go for that option.