simplecrypto / powerpool

A Python gevent driven stratum mining server
BSD 2-Clause "Simplified" License
48 stars 71 forks source link

0.7 Feature planning #83

Open icook opened 10 years ago

icook commented 10 years ago

From 0.7 milestone:

Make the move to a service orchestration architecture.

Overall goals:

  • Increase fault tolerance through a more distributed design.
  • Reduce component coupling further.
  • Move away from port specific settings to per-connection configurable defaults.

Big choices due for discussion:

  1. How to wire together the components.
  2. Which components do we separate? Pros and cons of separation.

Overall requirements/goals of wiring:

  1. Robust multiple PUB/SUB. We're likely going to setup many "jobmanagers" as publishers of new jobs so that one can fail or be upgraded with no downtime. Multiple "stratum managers" will be subscribing. Conventionally this would require a broker, but there are ways to avoid this SPOF. IE, in ZeroRPC each component has an internal ZeroMQ router that acts as a broker, but this doesn't make multiple components dependent on the health of a single broker. The downside of this system is added complexity in service discovery (as opposed to just selecting one of a set of separate brokers, configuring an instance local broker is required).
  2. Low latency above all else. We're not moving tons of data, but we want it to be pretty fast.
  3. Some are to store shared state. Almost all of these require shared state of some sort, but I think Redis is the obvious choice here. I might give a cursory glance at other options.

Candidates for 1:

Considerations for 2:

Jobmanager <-> Stratum Server I think this is the obvious first one to do. Since we're likely to have many stratum server processes, and we want to be able to add jobmanagers and upgrade their switching semantics easily this seems the most logical place to split first.

Stratum Server <-> Socket Connection This is something I've wanted for a while, allowing us to swap out client logic without disconnecting users. This would allow a lot more agility in development since the whole release/rollback cycle is so much less painful. Basically, some simple frontend handles recieving a connection and parsing out json messages and then passes it to an backend without looking at the contents. Backends can be restarted and load balanced easily.

Reporters <-> Stratum server I think this is the lowest value, although would be nice to have down the road. It would allow batching of shares quite nicely, which will become more of an issue once we have a lot (5-10+) stratum server processes. However we're not seeing many issues with share logging volume and I don't see many other advantages.

Metrics All of this will move to statsite (statsd). The whole stat counter thing was neat, but statsite is built for it.

Process Monitoring/Management

At this point I think it's Circus and Consul. Supervisor has very limited expandability making certain tasks a really big chore. Circus, while a bit green, has a relatively easy to use plugin system that will be a boon. I honestly wish there was something a bit more robust in this area, but there isn't.

ericecook commented 10 years ago

Here is my take on the options for 1:

Compared to ZeroRPC and NSQ, Python IPC looks like an inferior option for our purposes, I think we can probably rule it out. The other two packages try to accomplish almost exactly what we need done, and have so many advantages over IPC it isn't really funny. Also, I don't really see very many advantages of using Python IPC - at least for what we are trying to do (or because I don't know enough about it).

From my brief time spent reading about ZeroRPC and NSQ it looks like NSQ may be the best choice.

ZeroRPC is definitely a more minimalist approach, slightly more than a socket wrapper + broker, which is nice in that it adds quite a lot of flexibility - but definitely leaves a lot of stuff up to us to implement.

NSQ appears to a be a more holistic approach towards a scalable pubsub message delivery system. I particularly like the effort towards stronger guarantees in regards to message delivery. Also, it seems to be thoroughly thought through/developed, and the fact that its the second iteration of the software (redesigned simplequeue) gives me a surprising amount of confidence. Probably the biggest advantage of NSQ is that it has already solved quite a few problems we would end up working around with ZeroMQ/RPC or rolling our own.

That being said, NSQ is just not going to fit our purposes as exactly as we could develop ZeroMQ/RPC to do, and (without looking into it more) I'm not sure how/if their service discovery would interact with consul. Additionally, ZeroRPC is not a standalone daemon - just a library, which simplifies integration/monitoring/management quite a bit.

icook commented 10 years ago

A lot of research and discussion has occured in between updates on this ticket, but I think the conclusions are about thus:

If this is all good, then the next questions are:

Thoughts on this mess?