Open icook opened 10 years ago
Here is my take on the options for 1:
Compared to ZeroRPC and NSQ, Python IPC looks like an inferior option for our purposes, I think we can probably rule it out. The other two packages try to accomplish almost exactly what we need done, and have so many advantages over IPC it isn't really funny. Also, I don't really see very many advantages of using Python IPC - at least for what we are trying to do (or because I don't know enough about it).
From my brief time spent reading about ZeroRPC and NSQ it looks like NSQ may be the best choice.
ZeroRPC is definitely a more minimalist approach, slightly more than a socket wrapper + broker, which is nice in that it adds quite a lot of flexibility - but definitely leaves a lot of stuff up to us to implement.
NSQ appears to a be a more holistic approach towards a scalable pubsub message delivery system. I particularly like the effort towards stronger guarantees in regards to message delivery. Also, it seems to be thoroughly thought through/developed, and the fact that its the second iteration of the software (redesigned simplequeue) gives me a surprising amount of confidence. Probably the biggest advantage of NSQ is that it has already solved quite a few problems we would end up working around with ZeroMQ/RPC or rolling our own.
That being said, NSQ is just not going to fit our purposes as exactly as we could develop ZeroMQ/RPC to do, and (without looking into it more) I'm not sure how/if their service discovery would interact with consul. Additionally, ZeroRPC is not a standalone daemon - just a library, which simplifies integration/monitoring/management quite a bit.
A lot of research and discussion has occured in between updates on this ticket, but I think the conclusions are about thus:
If this is all good, then the next questions are:
estimated redis actions/second = authentications per second + shares per second*5
, so one additional action per share processed pretty much, which at current rates is not a big deal at all. Redis is usually well known to be able to handle 30,000 actions per second, which would be about 12x what we're doing right now. This could definitely be cut down a ton by grouping share reporting into one second batch jobs. Then it would be something more like authentications per second + shares per second*2 + number of powerpool instances (for reporting once a second all shares in a single action)
. These numbers exclude the frontends access, however if we get that large setting up a replicated redis client for the frontend would be trivial, and the delay doesn't matter much.Thoughts on this mess?
From 0.7 milestone:
Big choices due for discussion:
Overall requirements/goals of wiring:
Candidates for 1:
Considerations for 2:
Jobmanager <-> Stratum Server I think this is the obvious first one to do. Since we're likely to have many stratum server processes, and we want to be able to add jobmanagers and upgrade their switching semantics easily this seems the most logical place to split first.
Stratum Server <-> Socket Connection This is something I've wanted for a while, allowing us to swap out client logic without disconnecting users. This would allow a lot more agility in development since the whole release/rollback cycle is so much less painful. Basically, some simple frontend handles recieving a connection and parsing out json messages and then passes it to an backend without looking at the contents. Backends can be restarted and load balanced easily.
Reporters <-> Stratum server I think this is the lowest value, although would be nice to have down the road. It would allow batching of shares quite nicely, which will become more of an issue once we have a lot (5-10+) stratum server processes. However we're not seeing many issues with share logging volume and I don't see many other advantages.
Metrics All of this will move to statsite (statsd). The whole stat counter thing was neat, but statsite is built for it.
Process Monitoring/Management
At this point I think it's Circus and Consul. Supervisor has very limited expandability making certain tasks a really big chore. Circus, while a bit green, has a relatively easy to use plugin system that will be a boon. I honestly wish there was something a bit more robust in this area, but there isn't.