Closed bbeaudreault closed 7 years ago
We'd be happy to do the work for this, mostly looking for comments from the youtube folks
That sounds useful to me. Sending to Sugu as he's more familiar with this.
Sounds good to me also. Some high level comments:
/debug/vars
. Log messages are for humans when looking for more info on issues.Another option is to build this as an independent feature: create a new flag: queryserver-config-rows-warning-level
. This will keep the feature agnostic of issue 3. Also, you can use it to deploy newer constraints. For example, if you want to reduce the from 10k to 5k, you can set the warning at 5k. Once it's all clear, you can reduce the limit to 5k.
I'm beginning to like the second option more :).
I had thought about the second option, and agree it is alluring. I thought the first would be easier as a first pass. We can look into the second though, if that's preferred.
In terms of /debug/vars
vs logs, I agree. I want the logs so that I can actually see the full query, and who made it. So a human would be reading that. I was planning on using the existing Results histogram in the /debug/vars
to alert us to the need to check logs.
In terms of max rows code spread, even if the limit is used in multiple places, there is only 1 place where "Row count exceeded" is thrown. I also don't see many references to config.MaxResultSize or QueryEngine.maxResultSize.
There is already one flag to change max result size. But enforce flag will be useful.
The max rows limit is currently enforced as follows:
LIMIT :#maxLimit
is added to all selects (including subqueries for DMLs): you can look for GenerateLimitQuery
here: https://github.com/youtube/vitess/blob/master/go/vt/vttablet/tabletserver/planbuilder/dml.go.max
as the number, requesting an error if the rows exceed that amount: https://github.com/youtube/vitess/blob/master/go/vt/vttablet/tabletserver/query_executor.go#L751So, to change the max rows behavior to a warning, you need to write conditional code in the above three places. You'll also need to invent a 'very high, but safe constant' to pass to conn.Exec. But if you used the second route:
PS: I found a bug while inspecting the code. The UPDATE and DELETE subqueries don't enforce this limit, which should be fixed :).
Alright, sold. I did not think about needing to fix the LIMIT as well (duh). It will also be nice to be able to use it to transition limits. We'll go for that option.
We're working on migrating more and more databases onto Vitess and one thing we run into often is the
Row count exceeded
exceptions. The only way for us to really test this is to flip vitess on, and then quickly flip it off if there's a problem. This is fine because we're still in QA. However I have some concerns for prod:Row count exceeded
exceptions in prod. This is because at HubSpot we don't have the resources to make QA an identical replica of prod. Therefore databases and usage is smaller and lower in QA than in prod, so it's possible a query in QA is under 10k, but in prod is over.We'd like to add a queryserver option as a partner to
queryserver-config-max-result-size
:queryserver-config-enforce-max-result-size
. The default would be true to keep vitess's usual operating state. When it's set to false, a warning would be logged similar to today's error log. This, along with https://github.com/youtube/vitess/issues/3016 would allow us to alert, diagnose, and fix any queries that are above the row limit.Our goal is to enable enforcement of the maximum, but we need a way to get there safely considering the landscape of deployables and queries we are working with.