tristanls / discover

Node discovery based on Kademlia DHT protocol.
MIT License
69 stars 10 forks source link

Capability Searching #3

Closed skeggse closed 10 years ago

skeggse commented 11 years ago

I'm not sure if this fits in the project's vision, but it would be nice if I could search for nodes by capability. For example, I could have a User Management service, and want to talk to the Email service. I wouldn't know the node's unique identifier, but the node would know its capabilities.

tristanls commented 11 years ago

You're using a word that has very particular meaning for me :) "capability." I will assume that you do not mean "capability" within the context of Object Capability Model.

So assuming that, I think what you describe here could be created on top of the existing implementation.

Node id's don't have to be unique. That is, Discover is still a key/value DHT. So, for example, one could create a User Management service as a contact:

var contact = {
  id: "VXNlciBNYW5hZ2VtZW50IFNlcnZpY2U=", // "User Management Service" in base64
  data: {
    "Email": "c29tZSBlbWFpbCBub2RlIGlk"
  },
  transport: { ... }

You'd lookup "VXNlciBNYW5hZ2VtZW50IFNlcnZpY2U=" node, which would simply be a directory of the capabilities you want with what nodes they correspond to.

The warning here is that a DHT is eventually consistent. And without the vector clock implementation (that I mention in the README), old data may overwrite new data. Come to think of it, vector clock implementation could solve this problem.

Do I understand your use case correctly?

skeggse commented 11 years ago

Ahhh, I see. That's almost what I was looking for, except that I could have multiple Email services running simultaneously, and would want to get all of them from the User Management service, not just the one that has propagated to me.

tristanls commented 11 years ago

The User Management Service contact would be a globally available node. So if Email would be an array, perhaps that would work?

As to getting the latest value... the vector clock idea would have to be fullly implemented, but you would still have a propagation delay. This is because every node would assume it had the latest answer (until it got the update). But there is no guarantee that everyone would get the update. It gets difficult and starts to go outside the use case I think :/

tristanls commented 10 years ago

Closing due to inactivity.

skeggse commented 10 years ago

A while ago I built exactly what this module is not: a central discovery system. It works for my purposes, for now, but I can see it becoming a big problem pretty quickly. I'd like to use this module, but I don't understand how.

Discovery "enables point-to-point communications," which kinda makes sense but what I need is to, given a dozen node processes (for example), find the ip addresses and configuration data for a subset of those nodes based on some search parameters.

The email and user service examples are a little obscure, so I'll try to expand upon them.

I have a node process which exposes a number of operations, including createUser, updateUser, deleteUser, and similar operations. These, internally, interact with a database of sorts, and externally provide an API for interacting with user objects.

If I have another node process which sends emails. The user service wants to be able to send an email to the user when the user is created. The purpose of the services here is that the service can horizontally scale--I might have four node processes which can send emails. Given that the user service wants to send an email, it needs to figure out how to communicate with one or more of those email services. Naturally, the user service does not have a node id for the email service, which is what it needs to find.

So, given ten processes, five of which are user "services" and five of which are email "services," how can the user services find out about the email services? I'm not worried about the actual communication--that can be taken care of with HTTP or some inter-process transport.

Is this functionality discovery can or should provide? If not, can discover act as a building block towards that goal?

tristanls commented 10 years ago

Ah, perfect, thank you for elaborating on what you mean. I think I now better understand the use case you're after :)

Discover should be able to be used for this purpose. If I were faced with this problem, I would design it as follows.

Discover handles contact objects. There is contact.id, which is what you can look for (i.e. it's the key), and contact.data, which is the value that is stored and can be anything you want (that can be turned into JSON). To support your usage, the contact.id could be "sendEmail". The contact.data could be an array of addresses that can perform the "sendEmail" action. For example:

var contact = {
  id: "sendEmail",
  data: [ {host: 10.0.0.1, port: 1234}, {host: 10.0.0.2, port: 1234} ],
  vectorClock: 0
};
discover.register(contact);

When you want to update

var contact = {
  id: "sendEmail",
  data: [ {host: 10.0.0.1, port: 4321}, {host: 10.0.0.2, port: 1234} ], // change something
  vectorClock: 1 // increment the vector clock
};
discover.register(contact);

The vectorClock aspect of it is still experimental, but it should (maybe) work? It's next on the implementation roadmap to verify and make sure it does what I think it does (I'm including it in Implementation Correctness part of the roadmap).

tristanls commented 10 years ago

The elephant in the room for vectorClock is that now we are talking about doing distributed consistency across many servers. Discover works best if only one node or entity is responsible for a particular contact.id at the time. That is, there is a one-to-one mapping between contact.id and anything that will increment vectorClock. This way, if it gets updated, the update is guaranteed to come from only one place and increment the vectorClock. If the update is to come from multiple nodes at once, they will likely step on each other and put the system in an inconsistent state.

tristanls commented 10 years ago

So, in this particular use case. The full design would probably also include an "orchestrator".

The "sendEmail" nodes would never register themselves directly with Discover. They could use Discover to find "sendEmail-orchestrator". And this should be only one machine. This machine is responsible for updating the "sendEmail" information. The flow would be something like:

  1. New "sendEmail" node comes online, say node123.
  2. node123 looks for "sendEmail-orchestrator" in Discover and gets an answer nodeOrchestrator.
  3. node123 tells nodeOrchestrator that it exists.
  4. nodeOrchestrator updates "sendEmail" contact, increments the vector clock, and registers it with Discover
  5. now anyone that is only looking for "sendEmail" contact will find the new list that includes node123 by asking Discover
tristanls commented 10 years ago

From a distributed systems perspective, what we are doing with the "orchestrator" setup is linearizing our "writes" (updates to the "sendEmail" contact) while distributing our "reads" (queries for the "sendEmail" contact).

skeggse commented 10 years ago

@tristanls: wow. Many thanks for helping me understand this. I took some time to understand vector clocks, and now the concept of having an array of contacts starts to make sense. I now feel that I have a far stronger understanding of what discover is and does.

I have a few concerns about this implementation I'd like to highlight and ask for feedback on, though.

Single Point of Failure

I get linearizing writes, it makes a lot of sense, keeps the vector clock from growing and desynchronizing, and has a lot of other benefits. It does, however, add a point of contention. If the orchestrator dies, there's no way to add nodes. If everything were static once initialized, it would only affect the creation of new nodes, but since nodes can change their contact, this becomes more of a blocker. I'd like to eliminate a single point of failure, so I'm wondering how this could be improved, but this has implications I'll cover in a little bit.

Speed of Lookup

The speed at which I can look up all sendEmail nodes is critical in this particular application, and in the current system every user node knows about all other sendEmail nodes and can load-balance between them autonomous of everything else. When a sendEmail node is discovered or lost, all user nodes update their load-balancing arrays with the addition or removal of that sendEmail node. If I understand correctly, each time I want to interact with a sendEmail node I'd have to find all the nodes and choose one. How could this work in another way? I'd like to keep this distributed and reliable/eventually consistent/highly available, so it might be a hybrid solution, but being able to quickly find other nodes significantly speeds up the rate at which operations can occur.

Individual Identifiers

Currently all nodes generate a unique 32-byte identifier at startup, which they use to identify themselves with the central discovery server. That identifier can then be linked to the services they provide--in my particular implementation a node can be both a sendEmail node and a user node, though that's a little less important.

Distributed Writes

This is a really really minor concern but it would be kinda nice if the writes could be distributed instead of linearized. That would require a more complicated vector clock, and might ultimately not be worthwhile.

Considering the above

How could the above work in the context of discover? While on the one hand it seems a little odd for a DHT system to store all values of a certain group in all nodes, on the other hand it means that all values outside of the groups relevant to that node don't need to be stored. In other words, while the user nodes need to keep trace of all sendEmail nodes, they don't need to keep track of the contacts for any other kind of node.

Bootstrapping

This is more of a general question, but if I start a couple nodes, how do they find each other? When I start a subsequent node, how does it find one of the others? Ideally something like UDP broadcast would come into play, but AWS does not support UDP broadcast or multicast for whatever reason.

Again, thanks for your support on this, if it comes to it I'd love to help implement whatever's necessary to get here.

skeggse commented 10 years ago

A thought occurred to me about the vector clocks and distributed writes, I'll see if I can write (haha puns woops) it down before I lose it.

I really need a distributed indexed collection of items with optimistic retainment.

If the contact's data were structured like a hash, it might make this a little easier to explain. The keys of this hash are the 32-byte identifiers of a given node.

{
  "id": "sendEmail",
  "data": {
    "361439b..2022f70": {
      "vectorClock": 0,
      "host": "192.168.1.1",
      "port": 9555
    },
    "4c34b8d..b3ab765": {
      "vectorClock": 0,
      "host": "192.168.1.1",
      "port": 9556
    },
    "df22936..07c2202": {
      "vectorClock": 0,
      "host": "192.168.1.2",
      "port": 9555
    }
  }
}

For my purposes, it's enough to restrict changes to this hash such that a given node may alter only the item in the hash that corresponds to its identifier--361439b..2022f70 may only alter the 361439b..2022f70 key in the hash. Then, every item in the has can have its own vector clock that "belongs" to its creator/maintainer node (e.g. 361439b..2022f70). That maintainer node may alter to delete its item in the hash, but if someone thinks the node has crashed it can pessimistically change the hash by deleting it. If, of course, someone else gets an update after that point, the node's item would be reinstated (hence the optimistic retainment requirement in the collection definition above).

It's also important to note that this change moves the vector clock from the contact object to each item in its data object.

The whole purpose of representing it this way is to point out that really each contact needs to be doubly-indexed: once by its contact name, like "sendEmail," and once by its unique identifier, like "361439b..2022f70."

Does that all sound reasonable? It's starting to feel like this wouldn't fit in the scope of discover, so I'm not sure where that leaves me.

skeggse commented 10 years ago

Side question: have you come up with a reliable way of determining the local ip address a local process may advertise such that other nodes can contact it? I've resorted to using the socket.remoteAddress property on the receiving end, but that means that the node itself isn't allowed to know what its own address is...

tristanls commented 10 years ago

I'll try to answer one item at a time..

distributed writes / single point of failure / data hash with optimistic retention

You are absolutely right that an "orchestrator" would be a single point of failure. It bothers me too. I was thinking of that solution as a way to implement your use case without modifying Discover.

Some history of why Discover is the way it is. I didn't have your use case in mind. I was thinking of globally unique entities needing to find each other. With global uniqueness, I did not have the problem of write contention, so that simplification allowed me to get this far with Discover. Also, with global uniqueness, the vector clock mechanism was all that was needed to implement updates to the system, again, because of lack of contention for writes. Having said that, your use case is a common one and I think with some modifications to Discover it could be made more general to support what you have in mind, and that would be a big win.

Everywhere vectorClock is used in Discover, it is in a scope of a sort of equality, or version check. The vectorClock is also used in tristanls/k-bucket where it is also used as a sanity/equality check. However, now looking back at it, this is a hard coded concept of conflict resolution. I think that refactoring this and making it pluggable could be a way to move forward with better support for what you have in mind.

For example, refactoring this concept out, we could plug in CRDT functionality into Discover. What you describe in your optimistic retainment scheme is a type of CRDT. Merging conflicting CRDTs ends up being.. drumroll.. a type of conflict resolution. So refactoring vector clocks wherever they are in code (in Discover and k-bucket) and making room for a pluggable mechanism, could be a clean sweep to support many use cases beyond what you and I have in mind.

speed of lookup

Discover emits telemetry events showing how long it takes to do a lookup. I've implemented LRU cache in order to speed up certain use cases. If you look for a contact.id, it will be stored locally.

You can listen for stats events. For example, npm run localtest | grep ~stats gives a sample of what you can see.

For a more fully baked example, you can take a look at tristanls/tart-ansible project that also uses Discover. Without getting into a bunch of details there, once you npm install it, you can run npm run readme and you'll see Discover stats being dumped to the console. In that project, you'll see a lot of initial latency as the system bootstraps, but then most lookups result in 0ms (local) latency (again, look for lines starting with ~stats). Sample output npm run readme | grep ~stats below:

~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.ms 17
~stats: discover.stats.timers.find.round.ms 14
~stats: discover.stats.timers.find.request.ms 13
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 13
~stats: discover.stats.timers.find.request.ms 13
~stats: discover.stats.timers.find.ms 13
~stats: discover.stats.timers.find.round.ms 13
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.ms 14
~stats: discover.stats.timers.find.round.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.round.ms 15
~stats: discover.stats.timers.find.ms 15
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.round.ms 15
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 21
~stats: discover.stats.timers.find.round.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 2
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.ms 15
~stats: discover.stats.timers.find.round.ms 15
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.request.ms 14
~stats: discover.stats.timers.find.request.ms 16
~stats: discover.stats.timers.find.request.ms 16
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.request.ms 15
~stats: discover.stats.timers.find.ms 15
~stats: discover.stats.timers.find.round.ms 15
~stats: discover.stats.timers.find.request.ms 17
~stats: discover.stats.timers.find.round.ms 17
~stats: discover.stats.timers.find.ms 18
~stats: discover.stats.timers.find.request.ms 16
~stats: discover.stats.timers.find.request.ms 16
~stats: discover.stats.timers.find.request.ms 18
~stats: discover.stats.timers.find.request.ms 18
~stats: discover.stats.timers.find.request.ms 18
~stats: discover.stats.timers.find.round.ms 18
~stats: discover.stats.timers.find.request.ms 19
~stats: discover.stats.timers.find.round.ms 19
~stats: discover.stats.timers.find.ms 20
~stats: discover.stats.timers.find.request.ms 6
~stats: discover.stats.timers.find.ms 23
~stats: discover.stats.timers.find.round.ms 6
~stats: discover.stats.timers.find.request.ms 6
~stats: discover.stats.timers.find.request.ms 6
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.request.ms 3
~stats: discover.stats.timers.find.ms 4
~stats: discover.stats.timers.find.round.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.request.ms 5
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 5
~stats: discover.stats.timers.find.round.ms 5
~stats: discover.stats.timers.find.request.ms 5
~stats: discover.stats.timers.find.request.ms 5
~stats: discover.stats.timers.find.request.ms 3
~stats: discover.stats.timers.find.ms 4
~stats: discover.stats.timers.find.round.ms 3
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.request.ms 3
~stats: discover.stats.timers.find.ms 4
~stats: discover.stats.timers.find.round.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.request.ms 3
~stats: discover.stats.timers.find.ms 4
~stats: discover.stats.timers.find.round.ms 4
~stats: discover.stats.timers.find.request.ms 4
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
~stats: discover.stats.timers.find.ms 0
...

individual identifiers

I don't think the names of Discover nodes themselves are relevant. I don't actually use them for anything, they simply need to be unique. For the stuff that I actually care about I use discover.register(contact) and then somewhere else discover.find(contact.id).

As far as having nodes responsible for multiple things. Remember that only the contact.id needs to be unique if you don't want to accidentally override information. So there is nothing preventing you from having the same data content for "sendEmail", "createUser", "deleteUser". All of them can point to the same nodes.

distribution of data

In regards to where the stuff is stored. The whole point of Discover is to distribute the data. I wouldn't think that user nodes need to keep track of only email nodes. I view it as a desirable property that the information be spread out to as many nodes as possible. This improves the resiliency of the system.

Alternatively, if you feel strongly about it, you could simply set up a separate Discover DHT for your user nodes and a separate ones for other nodes.

bootstrapping

This is the chicken and the egg problem of distributed systems. I have no improvements to offer over a list of seeds. So, there is assumed prior knowledge that some 3 or more nodes always exist. You could delegate this list of seeds to be looked up on startup in some reliable oracle service (like S3).

A more dynamic approach does not get rid of the initial bootstrap, but later on, when the system is running, you could query Discover to enumerate all the information it has (although I don't think I have an API for this). And then use that as a list of seeds for whatever node is standing up. This way, as you get more and more nodes, their seed lists are being updated. This might be something interesting to do in the future, although I think we can get by with seed lists in the meantime.

local address

I typically rely on the operating system to tell the node process on start what it is (if that's necessary). If you are on AWS you can query instance metadata.

tristanls commented 10 years ago

@skeggse, version 0.4.0 should now enable you to implement what you're looking for. Please provide feedback when you get a chance.

skeggse commented 10 years ago

@tristanls thanks, that looks awesome! I'll see if I can start experimenting with it tomorrow.

tristanls commented 10 years ago

Closing. Follow up took place in #11.