robertcankney / gloom

0 stars 0 forks source link

ADR for transport/clustering #2

Open robertcankney opened 5 years ago

robertcankney commented 5 years ago
  1. gRPC for transport
  2. Re-hashing for determining ownership of keys - no balancing after initial add (that is, no initial balancing to new instances)
  3. LetsEncrypt for auto-setup of TLS for new nodes
robertcankney commented 5 years ago

Basic calls between servers/persistence:

  1. Reach out to seed nodes to join cluster - after join, re-balance key ranges for instances 1a. Need to determine how to authenticate new instances.
  2. Proxy new file request to new nodes
  3. Proxy request to owners of key to retrieve data 3a. Lots of proxying - want to find a way to simplify load on nodes that receive requests - maybe something akin to a 302? 3b. Comparison between instances of key date - would also make sense to store timestamp with key similar to the way that Cassandra stores a timestamp for each cell instance. 3c. Interesting - this'd potentially add a ton of overhead to have GETs perform validation, though this is fundamentally similar to a read repair. Can also have scheduled read repairs, and go for AP over C similar to Cassandra.
  4. Scheduled health checks for instance and key balancing
  5. POTENTIAL: load checks to ensure additional better proxying for new read/write requests 5a. Could check load along with data timestamp - if the same go to least active work 5b. Would not want to re-do key range, though, so limited use
robertcankney commented 5 years ago

Basic gRPC modeling:

  1. Client auth to server - rpc call
  2. Node join cluster - rpc call (seed node handles key range recalibration)
  3. New node re-balancing (likely add - arguably makes more sense than streaming old data at initial key rebalance) - rpc call
  4. Intra-cluster update - client stream to server
  5. Client update server - same as above
  6. Server updates client - server streams to client
  7. New file - client streams to server
  8. Node health checks - rpc call
  9. Node re-request health check (if node shows as down from other node but up from current node) - rpc call

Additionally, cluster join should be super simple - k8s Secret for initial cluster start secret + later joins, and ConfigMap to point to etcd for remaining info (other nodes, key ranges, etc.)

robertcankney commented 5 years ago

Pushed proto.

Aiming to move all client/node comm to a single message type with different handling based on type - including an enum to try and simplify state communication.

Aiming for CP over AP, naturally - also included node invalidation to drop nodes to degraded state if a given node cannot get a quorum of other nodes to agree on state of a node. Want to find a good way to approach this without leader election - could do with etcd but want to try and solve without that first. Currently, thinking timed status checks of... maybe owning non-quorum of other nodes for each node to check? Need to spec out further.