groupcache / groupcache-go

A high performance in memory distributed cache
Apache License 2.0
29 stars 1 forks source link

V3 Library Refactor #2

Closed thrawn01 closed 7 months ago

thrawn01 commented 8 months ago

Purpose

Implementation

See #1

Example Usage

import (
    "context"
    "fmt"
    "log"
    "time"
    "log/slog"

    "github.com/groupcache/groupcache-go/v3"
    "github.com/groupcache/groupcache-go/v3/transport"
    "github.com/groupcache/groupcache-go/v3/transport/peer"
)

func ExampleUsage() {
    ctx, cancel := context.WithTimeout(context.Background(), time.Second*10)
    defer cancel()

    // SpawnDaemon is a convenience function which Starts an instance of groupcache 
    // with the provided transport and listens for groupcache HTTP requests on the address provided.
    d, err := groupcache.SpawnDaemon(ctx, "192.168.1.1:8080", groupcache.Options{})
    if err != nil {
        log.Fatal("while starting server on 192.168.1.1:8080")
    }

    // Manually set peers, or use some discovery system to identify peers.
    // It is safe to call SetPeers() whenever the peer topology changes
    d.SetPeers(ctx, []peer.Info{
        {
            Address: "192.168.1.1:8080",
            IsSelf:  true,
        },
        {
            Address: "192.168.1.1:8081",
            IsSelf:  false,
        },
        {
            Address: "192.168.1.1:8082",
            IsSelf:  false,
        },
    })

    // Create a new group cache with a max cache size of 3MB
    group, err := d.NewGroup("users", 3000000, groupcache.GetterFunc(
        func(ctx context.Context, id string, dest transport.Sink) error {
            // In a real scenario we might fetch the value from a database.
            /*if user, err := fetchUserFromMongo(ctx, id); err != nil {
                return err
            }*/

            user := User{
                Id:      "12345",
                Name:    "John Doe",
                Age:     40,
            }

            // Set the user in the groupcache to expire after 5 minutes
            if err := dest.SetProto(&user, time.Now().Add(time.Minute*5)); err != nil {
                return err
            }
            return nil
        },
    ))
    if err != nil {
        log.Fatal(err)
    }

    ctx, cancel = context.WithTimeout(context.Background(), time.Second)
    defer cancel()

    var user User
    if err := group.Get(ctx, "12345", transport.ProtoSink(&user)); err != nil {
        log.Fatal(err)
    }

    fmt.Printf("-- User --\n")
    fmt.Printf("Id: %s\n", user.Id)
    fmt.Printf("Name: %s\n", user.Name)
    fmt.Printf("Age: %d\n", user.Age)

    // Remove the key from the groupcache
    if err := group.Remove(ctx, "12345"); err != nil {
        log.Fatal(err)
    }

    // Shutdown the instance and HTTP listeners
    d.Shutdown(ctx)
}

Code Map

This is also available in the README.md

groupcache.Instance

Represents an instance of groupcache. With the instance, you can create new groups and add other instances to your cluster by calling Instance.SetPeers(). Each instance communicates with other peers through the transport that is passed in during creation. TheInstance.SetPeers() calls Transport.NewClient() for each peer.Info struct provided to SetPeers(). It is up to the transport implementation to create a client which is appropriate for communicating with the peer described by the provided peer.Info struct.

It is up to the caller to ensure Instance.SetPeers() is called with a valid set of peers. Callers may want to use a peer discovery mechanism to discover and update when the peer topology changes. SetPeers() is designed to be called at any point during groupcache.Instance operation as peers leave or join the cluster.

If SetPeers() is not called, then the groupcache.Instance will operate as a local only cache.

groupcache.Daemon

This is a convenience struct which encapsulates a groupcache.Instance to simplify starting and stopping an instance and the associated transport. Calling groupcache.SpawnDaemon() calls Transport.SpawnTransport() on the provided transport to listen for incoming requests.

groupcache.Group

Holds the cache that makes up the "group" which can be shared with other instances of group cache. Each groupcache.Instance must create the same group using the same group name. Group names are how a "group" cache is accessed by other peers in the cluster.

transport.Transport

Is an interface which is used to communicate with other peers in the cluster. The groupcache project provides transport.HttpTransport which is used by groupcache when no other custom transport is provided. Custom transports must implement the transport.Transport and peer.Client interfaces. The transport.Transport implementation can then be passed into the groupcache.New() method to register the transport. The peer.Client implementation is used by groupcache.Instance and peer.Picker to communicate with other groupcache.Instance in the cluster using the server started by the transport when Transport.SpawnServer() is called. It is the responsibility of the caller to ensure Transport.SpawnServer() is called successfully, else the groupcache.Instance will not be able to receive any remote calls from peers in the cluster.

transport.Sink

Sink is a collection of functions and structs which marshall and unmarshall strings, []bytes, and protobuf structs for use in transporting data from one instance to another.

peer.Picker

Is a consistent hash ring which holds an instantiated client for each peer in the cluster. It is used by
groupcache.Instance to choose which peer in the cluster owns a key in the selected "group" cache.

peer.Info

Is a struct which holds information used to identify each peer in the cluster. The peer.Info struct which represents the current instance MUST be correctly identified by setting IsSelf = true. Without this, groupcache would send its self hash ring requests via the transport. To avoid accidentally creating a cluster without correctly identifying which peer in the cluster is our instance, Instance.SetPeers() will return an error if at least one peer with IsSelf is not set to true.

cluster package

Is a convenience package containing functions to easily spawn and shutdown a cluster of groupcache instances (called daemons).

Start() and StartWith() starts a local cluster of groupcache daemons suitable for testing. Users who wish to test groupcache in their own project test suites can use these methods to start and stop clusters. See cluster_test.go for more examples.

// Start a 3 instance cluster using the default options
_ := cluster.Start(context.Background(), 3, groupcache.Options{})
defer cluster.Shutdown(context.Background())
thrawn01 commented 7 months ago

This is ready for review. @udhos, @gedw99 @Tochemey @Jvb182 @Baliedge

I've created a repo for discovery mechanisms anyone should wish to add. https://github.com/groupcache/discovery-go

Tochemey commented 7 months ago

can we put the data package into the internal package? Are we planning to offer some custom serialization?

@thrawn01 never mind. I have seen the reason why we cannot put into an internal package.

Tochemey commented 7 months ago

@thrawn01 if you are ok I can pull the PR and made some changes. However I hope my recommendations on the PR make sense.

thrawn01 commented 7 months ago

@thrawn01 I cannot find the discovery interface and how it is used.

The discovery implementations will just call SetPeers() when the peer topology changes.

thrawn01 commented 7 months ago

I took the feed back about the data package and figured out way to remove it. Now a typical user will only need the groupcache and transport packages to use groupcache.

I like this new package layout much more than what I had previously!

Tochemey commented 7 months ago

@thrawn01 kindly merge it. LGTM

thrawn01 commented 7 months ago

I've implemented all the feed back I give a 👍 to and added comments to others. Thank you all for the your time and great comments!

Tochemey commented 7 months ago

@thrawn01 is there anything we need before merging the PR? If I can help, kindly let me know

thrawn01 commented 7 months ago

I was hoping for the other interested parties to review. If I don't get anything after Monday I'll merge. However I want to make one more follow up change to add support for different cache implementations before releasing V3.

I've been testing https://maypok86.github.io/otter/ and have been impressed with its performance. We should provide an option to use such implementations in groupcache.

Tochemey commented 7 months ago

https://maypok86.github.io/otter/

yeah I have seen Otter. Thanks for looking at it. So you want to add some of sort of API to allow custom cache implementation. That will be a great feature.

Tochemey commented 7 months ago

@thrawn01. I have been thinking lately about how does the V3 handle network topology changes. Are we planning to support split-brain syndrome? Is there any plan for rebalancing?.

thrawn01 commented 7 months ago

Discovery, split brain and rebalancing is beyond the scope of groupcache core.

However, If some third-party wanted to support such a thing it would be possible using the new transport interface and discovery.