tidb: support a plan cache for prepared statements

dbjoa commented 6 years ago

Since prepared statements are compiled and optimized whenever they are executed, there is a room to reduce the execution time of the prepared statements.

In order to do that we can compile a prepared statement once and reuse the compiled plan by converting parameter expressions to deferred ones, which are evaluated as late as possible, and storing the plan to a cache.

The implentation has some limitations:

the prepared statements with limit expressions are not cached
the statistics related to the cache are not supported
on-demand methods for invalidating the cache are not supported

CLAassistant commented 6 years ago

All committers have signed the CLA.

shenli commented 6 years ago

@dbjoa Thanks for your PR! This is such a big PR. Could you split it into a couple of smaller ones?

dbjoa commented 6 years ago

Sorry, I could not split the PR into smaller ones because the components changed are correlated for resolving the one problem except for the cache package.

When reviewing the PR, please consider that the cache package (533 lines) is an implementation of a LRU cache by removing mutex operation from the original codes(https://github.com/youtube/vitess/tree/master/go/cache).

shenli commented 6 years ago

OK, we will review the PR. It will take some time. Please sign the CLA.

shenli commented 6 years ago

@dbjoa Is there any benchmark to prove the performance improvement?

dbjoa commented 6 years ago

Here are the capture of TiDB monitoring dashboard to show the performance gap relatively. In the figure, "w/o plan cache" denotes the QPS for processing prepared statements without the plan cache, and "w/ plan cache" does the QPS with the plan cache.

The figure shows that the performance gain is about 27%.

Test Environment: instances are built from OpenStack(kilo)

TiDB : 3 x c3.4xlarge
TiKV : 4 x c3.4xlarge
PD : 3 x c3.large
HAProxy: 3 x c3.large
YCSB : 4 x c3.2xlarge

Test Workload (60K OPS --> TiDB Cluster)

target : 15K OPS / YCSB
thread : 150 / YCSB
readproportion : 1 (read only)
workload=com.yahoo.ycsb.workloads.CoreWorkload

Test DB

recordcount=30000000
fieldcount=10
fieldlength=100

Git Hash: 50350f72955d9d8b752a0d5ed40dcfdef0fef34d (17.6.26)

tidb-plan-cache-perf-tidb-3-tikv-4-tipd-3-17-07-17

blacktear23 commented 6 years ago

I have a question about Plan Cache. For now TiDB will generate a bad performance plan (Full table scan) when SQL like select * from testtable where intcolumn > "1" limit 10;. The generated plan will cast intcolumn to tpReal. For now I have do some job focusing on how to optimize type cast (#3924). If we change "1" to a placeholder like "?" as a deferred constant how can we guess this constant's type and convert it to fit column type?

dbjoa commented 6 years ago

@blacktear23

For a performance perspective, we shoud cast the parameter to int type. In order to that the default type of the parameter can be string like in a commercial database system. i.e., the expression, intcolumn > ?, can be converted into GT(intcolumn, to_int(getparam("0")). Here "0" means the index of the first parameter and the return type of getparam function is string.

Here is the data type precedence of SAP HANA, which will be helpful for your work: https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.01/en-US/46ff9650c7f44461a6146269c1e2a4c6.html

shenli commented 6 years ago

@dbjoa Thanks! We are ready to review your PR.

dbjoa commented 6 years ago

@zz-jason , Sorry, there seems to be new bugs in the modified codes. Please pause the review until the bugs are fixed. I'll let you know when to resume the review.

zz-jason commented 6 years ago

ok

dbjoa commented 6 years ago

@zz-jason, I'v fixed the issues. The new executors (i.e., TableReaderExecutor, IndexReaderExecutor, IndexLookUpExecutor for new_distsql) should be supported.

Please resume the review.

zz-jason commented 6 years ago

ok

sre-bot commented 6 years ago

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

iamxy commented 6 years ago

/ok-to-test

zz-jason commented 6 years ago

/run-all-test

iamxy commented 6 years ago

/run-all-test

zz-jason commented 6 years ago

@dbjoa any updates ?

dbjoa commented 6 years ago

@zz-jason, I'm testing the new patch of the issue. Please pause the review. I'll let you know soon.

zz-jason commented 6 years ago

@dbjoa OK

dbjoa commented 6 years ago

@zz-jason , Please resume the review.

coocood commented 6 years ago

@dbjoa Thank you for the great work. It's really impressive.

I would like to suggest another design approach which may simplify the code, makes it easier to read.

Instead of adding a method isDeffered to builtinFunc interface, we add a field deferredExpr to Constant type to indicate it is a deffered Constant.

A deferred Constant is folded from an expression with Param evaluated, but keep the original expression in deferredExpr field, so it can be used the same way in optimizer, and refresh the value when retrieved from the plan cache.

This approach would have much fewer lines of code to change.

What do you think?

dbjoa commented 6 years ago

@coocood I agree your suggestions. I also have had a plan of refactoring the current implementation by intoducing a deferred (constant) expression as the next step. Here, the current implementation of ParamMarkerExpr can be one of the prototypes of deferred Constant. However, in order to support that generally, since the existing evaluation framework might be extended significantly, this refactoring might be challengeable.

Should have this PR included the refactoring?

type ParamMarkerExpr struct { exprNode Offset int Order int Expr ExprNode //parameter getter expression when the plan is cached }

coocood commented 6 years ago

@dbjoa We don't need to introduce any new type. A deferred constant is only created by FoldConstant function, it is the Constant type we used before, but has an extra field which can be used to update the value after parameter changed.

coocood commented 6 years ago

@dbjoa If we use the same Contant type, I think the evaluation framework doesn't affect the refactoring.

zz-jason commented 6 years ago

before evaluating the deferred Constant, we can add a cast function upon its deferredExpr expression field, whose return type is a string

dbjoa commented 6 years ago

@coocood
Thank you for your insightful comments. I'll try to refactor this PR based on your suggestions.

coocood commented 6 years ago

@dbjoa How is it going?

dbjoa commented 6 years ago

@coocood , It is under testing. I'll let you know when the refactoring will be ready to review.

coocood commented 6 years ago

@dbjoa You can push your commit If you have any progress, it doesn't have to pass all the test, so we can start the review earlier.

dbjoa commented 6 years ago

@coocood , Yes, I'll push the changes with some failures after fixing two panics.

dbjoa commented 6 years ago

@coocood , I've pushed the changes with 5 failures. I'll also update the changes to resolve the failures progressively. output-of-make-dev.txt

coocood commented 6 years ago

@dbjoa We recently implemented a global plan cache which caches non-prepared statement. There is an LRU library you can used directly, so this PR will be smaller.

See: https://github.com/pingcap/tidb/pull/4644

dbjoa commented 6 years ago

@coocood Thank you for the information. I'll check it.

dbjoa commented 6 years ago

@coocood I've committed the patch to use the LRU library introduced in #4644.

coocood commented 6 years ago

@dbjoa I tried to execute the following statement.

create table t (id int, KEY id (id));
insert values (1), (2), (3);
prepare stmt from 'select * from t where id = ?';

Then execute the stmt without setting the variable @id.

execute stmt using @id;

Then the result is empty. But then set the variable to 1:

set @id = 1;
execute stmt using @id;

The result is still empty.

Another issue is that I log the plan with plan.ToString, the plan is not using index. The expected plan should be IndexReader, But got TableReader, the index is not used.

coocood commented 6 years ago

@dbjoa Please add the test cases for the issues above.

dbjoa commented 6 years ago

@coocood Thank you for your detailed comments.

I've updated the changes to resolve the two issues when prepared statements with null parameters:

incorrect plans generated when the null parameters are given at the first execution time
generated plans are not optimal due to the wrong placed type casts

In addition, I've also added the test cases for the issues addressed.

coocood commented 6 years ago

@dbjoa It's impressive that solved the issues. Thank you for your hard work!

But introduce a new ETParam may not be the best way to solve it. Each EvalType has its corresponding MySQL type, but the ETParam is an exception, it adds a lot of complexity to the expression framework.

Would you please try to find a solution without ETParam?

After removing ETParam, If you have any trouble, you can skip the test, then we will fix this issue after this PR has been merged.

dbjoa commented 6 years ago

@coocood I've removed ETParam and pushed the changes having two test fails. I'll find a solution to resolve the test fails soon!

dbjoa commented 6 years ago

@coocood I've found the solution and pushed the changes. The fix makes TypeUnspecified to be lower the priority than other types.

coocood commented 6 years ago

@dbjoa Well Done!

coocood commented 6 years ago

LGTM

coocood commented 6 years ago

@zz-jason PTAL

zz-jason commented 6 years ago

/run-all-test

dbjoa commented 6 years ago

@zz-jason Would you let me know why the Jenkins jobs are failed? Were the fails related to the PR?

zz-jason commented 6 years ago

@dbjoa It's the test environment problem, we are working on it, I'll rerun the tests after resolving the problem.

zz-jason commented 6 years ago

/run-all-test

dbjoa commented 6 years ago

@zz-jason I've updated the codes:

change the prepared-plan-cache key generation logic
consolidate the cacheable check logics
move prepared-plan related parameters to config/config.go and config.toml.example
change a function name
fix a crash bug and add tests

zz-jason commented 6 years ago

/run-all-test

pingcap / tidb

tidb: support a plan cache for prepared statements #3956