Closed dbjoa closed 6 years ago
@dbjoa Thanks for your PR! This is such a big PR. Could you split it into a couple of smaller ones?
Sorry, I could not split the PR into smaller ones because the components changed are correlated for resolving the one problem except for the cache package.
When reviewing the PR, please consider that the cache package (533 lines) is an implementation of a LRU cache by removing mutex operation from the original codes(https://github.com/youtube/vitess/tree/master/go/cache).
OK, we will review the PR. It will take some time. Please sign the CLA.
@dbjoa Is there any benchmark to prove the performance improvement?
Here are the capture of TiDB monitoring dashboard to show the performance gap relatively. In the figure, "w/o plan cache" denotes the QPS for processing prepared statements without the plan cache, and "w/ plan cache" does the QPS with the plan cache.
The figure shows that the performance gain is about 27%.
Test Environment: instances are built from OpenStack(kilo)
Test Workload (60K OPS --> TiDB Cluster)
Test DB
Git Hash: 50350f72955d9d8b752a0d5ed40dcfdef0fef34d (17.6.26)
I have a question about Plan Cache. For now TiDB will generate a bad performance plan (Full table scan) when SQL like select * from testtable where intcolumn > "1" limit 10;
. The generated plan will cast intcolumn to tpReal. For now I have do some job focusing on how to optimize type cast (#3924). If we change "1" to a placeholder like "?" as a deferred constant how can we guess this constant's type and convert it to fit column type?
@blacktear23
For a performance perspective, we shoud cast the parameter to int type. In order to that the default type of the parameter can be string like in a commercial database system. i.e., the expression, intcolumn > ?, can be converted into GT(intcolumn, to_int(getparam("0")). Here "0" means the index of the first parameter and the return type of getparam function is string.
Here is the data type precedence of SAP HANA, which will be helpful for your work: https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.01/en-US/46ff9650c7f44461a6146269c1e2a4c6.html
@dbjoa Thanks! We are ready to review your PR.
@zz-jason , Sorry, there seems to be new bugs in the modified codes. Please pause the review until the bugs are fixed. I'll let you know when to resume the review.
ok
@zz-jason, I'v fixed the issues. The new executors (i.e., TableReaderExecutor, IndexReaderExecutor, IndexLookUpExecutor for new_distsql) should be supported.
Please resume the review.
ok
Hi contributor, thanks for your PR.
This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.
/ok-to-test
/run-all-test
/run-all-test
@dbjoa any updates ?
@zz-jason, I'm testing the new patch of the issue. Please pause the review. I'll let you know soon.
@dbjoa OK
@zz-jason , Please resume the review.
@dbjoa Thank you for the great work. It's really impressive.
I would like to suggest another design approach which may simplify the code, makes it easier to read.
Instead of adding a method isDeffered
to builtinFunc
interface, we add a field deferredExpr
to Constant
type to indicate it is a deffered Constant.
A deferred Constant is folded from an expression with Param
evaluated, but keep the original expression in deferredExpr
field, so it can be used the same way in optimizer, and refresh the value when retrieved from the plan cache.
This approach would have much fewer lines of code to change.
What do you think?
@coocood
I agree your suggestions. I also have had a plan of refactoring the current implementation by intoducing a deferred (constant) expression as the next step. Here, the current implementation of ParamMarkerExpr
can be one of the prototypes of deferred Constant
. However, in order to support that generally, since the existing evaluation framework might be extended significantly, this refactoring might be challengeable.
Should have this PR included the refactoring?
type ParamMarkerExpr struct {
exprNode
Offset int
Order int
Expr ExprNode //parameter getter expression when the plan is cached
}
@dbjoa
We don't need to introduce any new type.
A deferred constant is only created by FoldConstant
function, it is the Constant type we used before, but has an extra field which can be used to update the value after parameter changed.
@dbjoa If we use the same Contant type, I think the evaluation framework doesn't affect the refactoring.
before evaluating the deferred Constant
, we can add a cast function upon its deferredExpr
expression field, whose return type is a string
@coocood
Thank you for your insightful comments. I'll try to refactor this PR based on your suggestions.
@dbjoa How is it going?
@coocood , It is under testing. I'll let you know when the refactoring will be ready to review.
@dbjoa You can push your commit If you have any progress, it doesn't have to pass all the test, so we can start the review earlier.
@coocood , Yes, I'll push the changes with some failures after fixing two panics.
@coocood , I've pushed the changes with 5 failures. I'll also update the changes to resolve the failures progressively. output-of-make-dev.txt
@dbjoa We recently implemented a global plan cache which caches non-prepared statement. There is an LRU library you can used directly, so this PR will be smaller.
@coocood Thank you for the information. I'll check it.
@coocood I've committed the patch to use the LRU library introduced in #4644.
@dbjoa I tried to execute the following statement.
create table t (id int, KEY id (id));
insert values (1), (2), (3);
prepare stmt from 'select * from t where id = ?';
Then execute the stmt without setting the variable @id.
execute stmt using @id;
Then the result is empty. But then set the variable to 1:
set @id = 1;
execute stmt using @id;
The result is still empty.
Another issue is that I log the plan with plan.ToString
, the plan is not using index.
The expected plan should be IndexReader
, But got TableReader
, the index is not used.
@dbjoa Please add the test cases for the issues above.
@coocood Thank you for your detailed comments.
I've updated the changes to resolve the two issues when prepared statements with null parameters:
In addition, I've also added the test cases for the issues addressed.
@dbjoa It's impressive that solved the issues. Thank you for your hard work!
But introduce a new ETParam may not be the best way to solve it.
Each EvalType
has its corresponding MySQL type, but the ETParam
is an exception, it adds a lot of complexity to the expression framework.
Would you please try to find a solution without ETParam
?
After removing ETParam
, If you have any trouble, you can skip the test, then we will fix this issue after this PR has been merged.
@coocood I've removed ETParam and pushed the changes having two test fails. I'll find a solution to resolve the test fails soon!
@coocood I've found the solution and pushed the changes. The fix makes TypeUnspecified to be lower the priority than other types.
@dbjoa Well Done!
LGTM
@zz-jason PTAL
/run-all-test
@zz-jason Would you let me know why the Jenkins jobs are failed? Were the fails related to the PR?
@dbjoa It's the test environment problem, we are working on it, I'll rerun the tests after resolving the problem.
/run-all-test
@zz-jason I've updated the codes:
/run-all-test
Since prepared statements are compiled and optimized whenever they are executed, there is a room to reduce the execution time of the prepared statements.
In order to do that we can compile a prepared statement once and reuse the compiled plan by converting parameter expressions to deferred ones, which are evaluated as late as possible, and storing the plan to a cache.
The implentation has some limitations: