vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.52k stars 2.09k forks source link

RFC: Vtctld Service #7058

Open ajm188 opened 3 years ago

ajm188 commented 3 years ago

(I'm working on a sharable google doc form, for folks that would prefer to comment inline)

Abstract

This document presents a design to define a well-typed gRPC interface to the vtctld component of Vitess.

Background

The current vtctl API is a gRPC interface defining a single method:

service Vtctl {
    rpc ExecuteVtctlCommand(ExecuteVtctlCommandRequest) returns (stream ExecuteVtctlCommandResponse) {};
}

message ExecuteVtctlCommandRequest {
    repeated string args = 1;
    int64 action_timeout = 2;
}

message ExecuteVtctlCommandResponse {
    Event event = 1;
}

// Extracted from logutil.proto
message Event {
    // other fields omitted
    string value = 5;
}

Note: this is a modified example to fit in one snippet.

This interface gave us the flexibility to add, modify, and remove commands quickly, without needing to worry about adherence to a strict protobuf API definition.

It now poses several problems.

First, from the client perspective, this is effectively an untyped interface. Unlike other gRPC interfaces, the first element of ExecuteVtctlCommandRequest.args determines the underlying method call and the shape of the response data. We lose the benefit of protobufs defining the structure on both sides of the network boundary, and must parse arbitrary, sometimes-but-not-always JSON-formatted bytes into data structures before we can do anything with them. The list of available methods and their arguments, as well as the shape of their responses may change at any time, without any feedback or warning from the Go compiler. Further, some of the current vtctl methods use custom MarshalJSON implementations, without providing the inverse UnmarshalJSON, forcing the client to define nearly-identical types to provide this behavior. These types must be kept in sync by the client, causing additional maintenance burden.

Second, on 07/24/2020, VEP3: Vitess Upgrade Order removed the upgrading ordering requirements, beginning with version 8.0. VEP3 specified a new requirement to facilitate this change: “a new version of a server must work with an old version as well as the current version.” Due to other problems mentioned in the above paragraph, this is difficult, if not impossible, to guarantee at compile time for the vtctl API.

Goals

  1. Provide a well-typed gRPC API for writing client programs against a vtctld.
  2. Define a path to deprecate the former API.

Non-Goals

  1. Modifications to the existing API, or to its consumers.

Proposal

We propose to add a new service to the existing vtctldservice.proto file, called Vtctld that will live alongside the Vtctl service.

Design

The service will provide one RPC definition for each supported vtctl command (GetKeyspaces, GetVSchema, Backup, VReplicationExec etc). Any additional message definitions will be defined in vtctldata.proto, unless a more appropriate proto file is identified.

When adding an RPC to VtctldServer, the corresponding legacy implementation should be updated to invoke the new implementation under the hood, and then invoke the usual printJSON(wr.Logger(), resp). For a proof-of-concept of this, see branch am_vtctld_proto.

A few things to point out in that example:

package vtctlclient

Package vtctlclient uses a factory pattern to return an implementation of VtctlClient. Clearly, this does not work with the new interface defined by the Vtctld protobuf.

Instead, we introduce a new interface defined as:

type VtctldClient interface {
    vtctlservicepb.VtctldClient
    Close() error
}

From there, we add a new factory and constructor:

var vtctldClientProtocol = flag.String("vtctld_client_protocol", "grpc", "the protocol to use to talk to the vtctld server")

type VtctldFactory func(addr string) (VtctldClient, error)

var vtctldFactories = make(map[string]VtctldFactory)

func RegisterVtctldFactory(name string, factory VtctldFactory) {
        if _, ok := factories[name]; ok {
                log.Fatalf("RegisterVtctldFactory: %s already exists", name)
        }
        factories[name] = factory
}

func NewVtctldClient(addr string) (VtctldClient, error) {
        factory, ok := factories[*vtctldClientProtocol]
        if !ok {
                return nil, fmt.Errorf("unknown vtctld client protocol: %v", *vtctldClientProtocol)
        }
        return factory(addr)
}

Identical with the current layout, fakevtctlclient and grpcvtctlclient will provide implementations and register factories for those implementations. vtctlclienttest will adapt the current test suite to the new interface.

Compatibility Considerations

We will need to update vtworker, automation, and vttest to use the new vtctldclient. An outstanding question for me is if we can ignore vtworker, or if that needs to be updated as well.

Timeline and Rollout

A codebase with two competing ways of accomplishing a task can be frustrating to work in at best, and risky to work in at worst. We want to minimize the amount of time in this state. To accomplish this, we will declare the VtctldServer interface to be unstable until it has met parity with the VtctlServer. We can consider the two at parity when all legacy vtctl commands defer their logic to the corresponding VtctldServer methods.

What “unstable” means in this context may be unorthodox, so:

The distinction here is that an engineer writing an arbitrary external program against the VtctldServer should not trust the stability of the interface during the transition, but when changing any particular VtctldServer method, we will update any vtctl commands that call that method to ensure correct behavior from a blackbox perspective.

We aim for this period to take less than two months.

After reaching parity, the VtctldServer API will be declared stable and become subject to all the standard release cadence and compatibility standards of the Vitess project. At the same time, we will declare the VtctlServer API to be deprecated.

Other Considerations

  1. We choose to make VtctldServer take its own TopoServer and TabletManagerClient for interacting with the topo, and to have Wrangler take a VtctldServer to have calls flow in that direction. Alternatively, we could force VtctldServer use a Wrangler to access the topo. This would look like:

    type VtctldServer struct {
        wr *wrangler.Wrangler
    }
    
    func (vtctld *VtctldServer) GetKeyspace(ctx context.Context, req *vtctldatapb.GetKeyspaceRequest) (*vtctldatapb.Keyspace, error) {
        keyspace, err := s.wr.TopoServer().GetKeyspace(ctx, req.Keyspace)
        if err != nil {
            return nil, err
        }
    
        return &vtctldatapb.Keyspace{
            Name: req.Keyspace,
            Keyspace: keyspace.Keyspace,
        }, nil
    }

    I have no strong opinion. At first glance, giving VtctldServer direct access to the topo server feels simpler than forcing everything through the wrangler, but this is a very minor point with respect to the overall proposal. If going through wrangler is indeed better, then we will switch to that.

  2. We chose to create an entire second interface rather than progressively add typed methods to the existing interface. We spent an afternoon exploring this alternative and found that the number of types and mocks that would need updating each time a method was added wasn’t worth the trouble of keeping the vtctl name. The upside here is getting to move faster; the downside here is that we have to pick a new name, hence vtctld.

derekperkins commented 3 years ago

I'm very excited for this. We've avoided some automation simply because the old "api" was so fickle to deal with. It has also held back innovation on the Vitess dashboard & UI.

deepthi commented 3 years ago

This is very detailed and thorough. What happens to the existing vtctlclient binary? Do we deprecate that along with VtctlServer?

ajm188 commented 3 years ago

Yeah, we definitely want to deprecate that. Correct me if I'm wrong, but I think to maintain compatibility guarantees the ordering would have to be:

deepthi commented 3 years ago

VEP-3 specifically deals with components of a running vitess system. vtctlclient is not exactly part of a running cluster. We would be ok to think of it as being part of vtctld and deprecating it along with the server. If anyone objects we can re-consider.

rohit-nayak-ps commented 3 years ago

Design and implementation plan look great. I also really like the fact that this uses cobra: allows us to use bash completion, auto-suggest and sub-commands etc etc. I had tried to retrofit cobra but we decided it wasn't worth the effort at that time. Also a useful effect is that a lot of deprecated and obsolete commands will get dropped.

ajm188 commented 3 years ago

I had tried to retrofit cobra but we decided it wasn't worth the effort at that time.

Yeah, revisiting flags more holistically is uhhhh another thing I would be interested in doing at some point 😅