We are the LLVM team. This issue is blocking us from integrating plan caching into the system (our caching system already passes standalone tests).
Cache system
Our cache is a singleton object which boils down to a:
std::map<std::unique_ptr<planner::AbstractPlan>, // physical plan
std::unique_ptr<codegen::Query>> // compiled query
(We've implemented a plan comparator.)
The use of std::unique_ptr's is reasonable - the cache owns the cached plan and compiled query. We could also use std::shared_ptr here. The key is that a plan in the cache should not be freed.
When the user wants to access a plan in the cache, we would provide a const planner::AbstractPlan * (ownership borrowing - the cache still owns the plan).
When a new planner::AbstractPlan comes, we check whether a structurally equivalent plan is already in the cache. If there isn't, then we store the plan in the cache.
Now we have to carefully manage the lifecycle of a plan. There are basically 2 ways we come up with:
The user provides a std::unique_ptr<planner::AbstractPlan>.
We move that plan into the cache. The user no longer owns the plan.
The user provides a std::shared_ptr<planner::AbstractPlan>.
We increase the refcnt.
The user provides a const planner::AbstractPlan *.
In this way the user isn't giving us any ownership. We never use non-owning raw pointers. We make a copy of the plan and store the copy into the cache. Like this:
const planner::AbstractPlan *plan = ...; // user gives us this
std::unique_ptr<planner::AbstractPlan> copy = plan->Copy();
std::unique_ptr<codegen::Query> compiled_query = ...;
Insert(std::move(copy), std::move(compiled_query));
We believe that the first and second methods are better.
There are also technical issues with regards to the second approach: currently the Copy() method for a lot of kinds of plans are incorrectly implemented.
Ownership Problem
We have currently adopted the first approach (moving to the second is also easy), and the cache is passing standalone test cases. However, the current peloton system is blocking us from integrating the cache. Here is why.
The main entry point for executing a plan is PlanExecutor::ExecutePlan(), where you compile the plan (if the compiler supports this kind of plan) and execute it.
However, this function takes in a const planner::AbstractPlan *, instead of a std::unique_ptr<planner::AbstractPlan> - we are not given the ownership of the plan.
It turns out that the plan stored inside a statement is actually a std::shared_ptr<planner::AbstractPlan>, and these Statements are also std::shared_ptr<Statement>s.
Conclusion
Therefore, the current peloton manages the lifecycle of physical plans in a way that prevents us from integrating the cache.
We have come up with several possible approaches.
Change the interface of PlanExecutor::ExecutePlan() so that it takes a std::unique_ptr<planner::AbstractPlan>. This would lead to other changes in the system.
Change the interface of PlanExecutor::ExecutePlan() so that it takes a std::shared_ptr<planner::AbstractPlan>. This would lead to fewer changes in the system, since the plan is also stored as a shared pointer in a Statement. Then we also change our cache to be a
std::map<std::shared_ptr<planner::AbstractPlan>, // physical plan
std::unique_ptr<codegen::Query>> // compiled query
Fix up all the Copy() methods of planner::AbstractPlan's. We don't think this is the right way to go. Copying a plan doesn't make sense, when we have a std::shared_ptr.
We are the LLVM team. This issue is blocking us from integrating plan caching into the system (our caching system already passes standalone tests).
Cache system
Our cache is a singleton object which boils down to a:
(We've implemented a plan comparator.)
The use of
std::unique_ptr
's is reasonable - the cache owns the cached plan and compiled query. We could also usestd::shared_ptr
here. The key is that a plan in the cache should not be freed.When the user wants to access a plan in the cache, we would provide a
const planner::AbstractPlan *
(ownership borrowing - the cache still owns the plan).When a new
planner::AbstractPlan
comes, we check whether a structurally equivalent plan is already in the cache. If there isn't, then we store the plan in the cache.Now we have to carefully manage the lifecycle of a plan. There are basically 2 ways we come up with:
The user provides a
std::unique_ptr<planner::AbstractPlan>
. We move that plan into the cache. The user no longer owns the plan.The user provides a
std::shared_ptr<planner::AbstractPlan>
. We increase the refcnt.The user provides a
const planner::AbstractPlan *
. In this way the user isn't giving us any ownership. We never use non-owning raw pointers. We make a copy of the plan and store the copy into the cache. Like this:We believe that the first and second methods are better.
There are also technical issues with regards to the second approach: currently the
Copy()
method for a lot of kinds of plans are incorrectly implemented.Ownership Problem
We have currently adopted the first approach (moving to the second is also easy), and the cache is passing standalone test cases. However, the current peloton system is blocking us from integrating the cache. Here is why.
The main entry point for executing a plan is PlanExecutor::ExecutePlan(), where you compile the plan (if the compiler supports this kind of plan) and execute it.
However, this function takes in a
const planner::AbstractPlan *
, instead of astd::unique_ptr<planner::AbstractPlan>
- we are not given the ownership of the plan.When we further look into it, we find that there's only one caller of PlanExecutor::ExecutePlan(), which is TrafficCop::ExecuteStatementPlan(), and it also takes a
const planner::AbstractPlan *
.Every call to TrafficCop::ExecuteStatementPlan() passes the plan by some
statement->GetPlanTree().get()
.It turns out that the plan stored inside a statement is actually a
std::shared_ptr<planner::AbstractPlan>
, and theseStatement
s are alsostd::shared_ptr<Statement>
s.Conclusion
Therefore, the current peloton manages the lifecycle of physical plans in a way that prevents us from integrating the cache.
We have come up with several possible approaches.
Change the interface of PlanExecutor::ExecutePlan() so that it takes a
std::unique_ptr<planner::AbstractPlan>
. This would lead to other changes in the system.Change the interface of PlanExecutor::ExecutePlan() so that it takes a
std::shared_ptr<planner::AbstractPlan>
. This would lead to fewer changes in the system, since the plan is also stored as a shared pointer in aStatement
. Then we also change our cache to be aFix up all the
Copy()
methods ofplanner::AbstractPlan
's. We don't think this is the right way to go. Copying a plan doesn't make sense, when we have astd::shared_ptr
.