pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.26k stars 5.85k forks source link

Refactor the usage of session context in `executor` #49491

Open YangKeao opened 11 months ago

YangKeao commented 11 months ago

Current situation

A normal routine of using pkg/executor package is to call (*ExecStmt).Exec or (*ExecStmt).PointGet. The session context is wrapped inside the ExecStmt. The method call (*ExecStmt).Exec will return a record set. The rows are dumped by calling RecordSet.Next. If everything goes well, the RecordSet will be closed after draining the rows. During this process, the session context exists in several places:

  1. ExecStmt.Ctx
  2. executorBuilder.ctx
  3. BaseExecutor and some other related structures inside executors (e.g. FKCheckExec...)

It would be suggested to fix them from bottom to top, as the context required by the bottom part is always a subset of the context required by the upper part. The process of refractoring the bottom part is squeezing the usage of session context to the upper part.

Plan

Executor

From the time when the context is needed, they can also be split into two parts:

  1. In build and open. Though most of the time build should be seen as part of executorBuilder.
  2. In the Next() function. Some of the context currently needed by the Next() function can also be migrated to build / open parts and stored inside the executor.

We can use different parameters to build each executor actually, and give them different context needed by them. The context needed by Open can also be set during build, because Open and build are always run in the same "statement", and the context should be the same when calling them.

However, we should have a unified interface to actually run different kinds of executors, so for the Next() function we'll need the unified interface.

Target

After refractoring, the BaseExecutor shouldn't contain the session context (and so does any other fields of an executor). All context needed by the build and open should be passed and stored in the executor in the build stage. BaseExecutor can still be used to store some information needed by all executors, like memTracker, but it's suggested to keep it tidy and leave more configurations (e.g. timezone) to the field of each executor itself.

We can also have part (one low level) of the expression.Context in some executors.

The Next() function should accept an interface:

Next(ctx context.Context, ectx Context, req *chunk.Chunk) error

The ectx should contain all information needed by the executor to run.

type Context interface {
    ... // TODO. 
    // It's also possible that the executor may don't need any context for `Next` function. We can have a conclusion after refractoring / analyzing each executors. It doesn't really matter.
    // Also, we should keep this interface as small as possible.
}

For example, to solve the cursor fetch situation, we'll need to store the expression context in executor.

type ProjectionExec struct {
    evalContext expression.Context
    ....
} 

However, the build function should return error if the expression requires a higher level than the context can provide:

func (b *executorBuilder) buildProjection(v *plannercore.PhysicalProjection) exec.Executor {
    //...
    for _, expr := range v.Exprs {
        if expr.EvalContextRequirement() > b.maxEvalContextLevel {
            b.err = some_error
            return nil
        }
    }

    evalCtx := b.ctx.GetEvalCtxWithLevel(b.maxEvalContextLevel)

    e := &ProjectionExec{
       evaluatorSuit:    expression.NewEvaluatorSuite(v.Exprs, v.AvoidColumnEvaluator),
       evalContext: evalCtx
    }
    //...
}

We can set different b.maxEvalContextLevel based on whether it's cursor fetch or not. Then, for FETCH command, the unParallelExecute will use the recorded expression context:

func (e *ProjectionExec) unParallelExecute(ctx context.Context, chk *chunk.Chunk) error {
...
    e.evaluatorSuit.Run(e.evalContext, e.childResult, chk)
...
}

It's necessary to store an expression context in ProjectionExec, because we have to guarantee that the expression doesn't change between EXECUTE and FETCH.

Roadmap

Well, it seems to be too complicated to solve them all at once. Luckily, we can refractor the usage of session context for each executor one by one. Before focusing on one of the executors, we'll need to do some global refractors:

  1. Split Base() *BaseExecutor method to multiple methods: AllChildren, TryNewCacheChunk, RuntimeStats... Try to avoid GetCtx.
  2. Modify the function Next(ctx context.Context, req *chunk.Chunk) error to Next(ctx context.Context, ectx Context, req *chunk.Chunk) error. All implementation should just ignore the ectx. It's also fine to have an empty interface Context. We'll enrich the content of Context progressively.
  3. Extract part of the BaseExecutor. Rename the old one to LegacyBaseExecutor and the extracted part should be named as BaseExecutor. It should also implement the Executor interface, but doesn't have session context inside.
type LegacyBaseExecutor struct {
    ctx           sessionctx.Context
    BaseExecutor
}
type BaseExecutor struct {
    AllocPool     chunk.Allocator
    schema        *expression.Schema // output schema
    runtimeStats  *execdetails.BasicRuntimeStats
    children      []Executor
    retFieldTypes []*types.FieldType
    id            int
    initCap       int
    maxChunkSize  int
}

Then for each executor, we can do the refractor:

  1. Replace the LegacyBaseExecutor with BaseExecutor.
  2. Solve every part which needs the session context: move them to the build/open stage and pass as an argument in build function, or get this information in Next stage and add the necessary method in executor.Context interface. Sometimes, it's also needed to work on other packages to solve the problem. Actually, different executors need to be considered one by one.

As it may take months to migrate all executors, we can have a helper function to show whether this executor is legacy or not:

func (LegacyBaseExecutor) IsLegacy() bool { return true }
func (BaseExecutor) IsLegacy() bool { return false }
type Executor interface {
    IsLegacy() bool
}

Use the cursor fetch problem as an example. We can use IsLegacy() to decide whether the current executor can be lazily fetched. If IsLegacy is true and the executorBuilder constructs the executor successfully, this executor can be safely stored in the recordSet and fetched later.

YangKeao commented 11 months ago

Track issues