Concurrency model - Githubissues

Lukasa commented 11 years ago

Normally for a networked application I'd want to add concurrency. This is particularly important for things like Pi autodiscovery, which relies on having all the Pis on the LAN listening for broadcast packets at all times.

Unfortunately, adding concurrency into the client application is likely to make the code much harder to fiddle with. This leaves us with two questions. Firstly, do we want concurrency at all? Secondly, if we do, which concurrency method do we want to use?

Twisted
Tornado (Probably not suited for the client)
gevent
greenlet
multiprocessing
threading (We definitely shouldn't use this: if we want to, we should use multiprocessing instead and save the headache.
concurrent.futures (Python 3.2+ only, but nicer).

Thoughts?

ZsigE commented 11 years ago

I'd tend to go with multiprocessing most of the time, simply because I'd prefer that people looking at the code didn't have to learn an entire framework just to get messages flying around. I've just checked in an example of a way of doing this - receiving messages works on my Pi at home, but I haven't yet tested the sending form of it. Seems reasonably easy though.

peterbrittain commented 11 years ago

Given the aim to teach people how to program, I think it's important that we don't get too bogged down in complex frameworks, so err away from twisted in particular. I also think we should try to stick to one version of Python and so don't want to bring in concurrent.futures unless there is a very good reason to do so. Would be happy to try most of the others.

peterbrittain commented 11 years ago

Some other possibilities too... See http://nichol.as/asynchronous-servers-in-python

A lot of frameworks seem to be moving towards task based scheduling (like greenlets) while others try to wrap it all up in co routines. These both feel like a better paradigm to use instead of callbacks if we can find something that integrates nicely with an existing HTTP client.

Lukasa commented 11 years ago

For the sake of simplicity, server-side, I've bitten the bullet and chosen Tornado.

Pros:

It doesn't have Twisted's crazy concurrency model, making the code significantly more understandable: it's predominantly callback driven, which isn't perfect, but for our simple use case it's not too bad.
It has a built-in WSGI wrapper, which means the Django server can run in the same process. Whether we actually do that is another question (it'll cost us some perf, and in particular could hold up the event loop in a really nasty way).
It's really fast. 7400 connections per second in some benchmarks. Can easily do 10k concurrent connections.

Cons:

For our use case it will have to be fundamentally single-process. The built-in multi-process TCP server doesn't have any memory sharing. We could probably program our way around it, but it's not great.
It's not totally simple. For our use case the actual code we have to write is very simple (I whipped up a basic proof-of-concept repeater in about 30 minutes, with no previous Tornado experience). However, understanding it involves grasping callbacks, event loops and non-blocking socket IO. I don't know if this is going to hurt the 'as a learning experience' aspect of the server code, though you could view this code as being "expert mode" for those who've understood the earlier stuff.

You can see my first pass at the code here. There's some major refactoring to do, which means the code isn't as clear as I'd like it to be, but it's not too bad. Right now it doesn't actually meet the protocol spec I designed (its echoed messages don't send headers, which is wrong), but it works pretty well.

peterbrittain commented 11 years ago

Neat demo! Next (obvious) question - how do we coordinate the Tornado proxy so that it connects the right streams together?

Lukasa commented 11 years ago

@peterbrittain Correct question. The way I see it there are two problems.

Updating the conga based on input from the web server.
Updating the conga (and webserver) based on behaviour of the connections (e.g. if a TCP connection drops).

Notably a problem we don't have is one of synchronisation: we can't run this proxy in multiple processes anyway, so we can safely keep a global map of sockets and their partners on the Tornado proxy.

My thoughts about connecting the correct two streams are currently that Tornado should just query the database. This allows Django to add data to the database, and then when Tornado gets a new connection it simply looks up the database to find who the connection's partner is supposed to be. It should be possible to do this asynchronously as well, which will help perf.

I've yet to come up with a good model for how to communicate people 'dropping out' of a conga, though likely having Tornado update the DB is the way to go.

peterbrittain commented 11 years ago

Your issue for now, Cory.

Lukasa commented 11 years ago

We'll need an async database abstraction in the Tornado layer. I'm leaving this link here for my future reference.

Lukasa commented 11 years ago

So, just to document my train of thought, I've concluded that because we can 'trust' the clients we don't need to worry quite so much about communicating about disconnecting clients, at least initially. Instead, each client should send some HTTP to deregister, and then follow that up with a conga BYE message. This should drive the conga server to lookup a new partner for the previous link in the conga.

peterbrittain commented 11 years ago

@Lukasa That probably means we should set up a proper DB server on the production system... Looks like postgresql would be an obvious choice here.

Lukasa commented 11 years ago

@peterbrittain Postgres was going to be my suggestion too. I'll take a look into it, but I might want to stub it out for the moment. We'll see.

peterbrittain commented 11 years ago

@Lukasa I've set up several installations of Django using postgres in the past. That won't be too hard. I'll see if I can get that sorted as the start of the production server install tonight.

Lukasa commented 11 years ago

Oh, yeah, setting up Postgres isn't hard in that sense, it's just a pain to set it up on every development machine. =)

peterbrittain commented 11 years ago

Good point. Yeah - it was nice to be able to roll out the sqlite DB for the development server... Any chance you can set up a DB layer that allows you to swap server types (sqlite/postgres) easily? If not, we'll just need to beef up how to set up the development server instructions.

Lukasa commented 11 years ago

See dj-database-url.

peterbrittain commented 11 years ago

Sorry - I wasn't clear. I was aware that we could set up stuff from the Django side. I meant for your tornado code.

Lukasa commented 11 years ago

@peterbrittain Yeah, I'm probably just going to steal the general idea and reimplement it. No problem. =)

peterbrittain commented 11 years ago

@lukasa Yeah - how hard can it be?! Got a spare 10 minutes... ;-)

peterbrittain commented 11 years ago

@Lukasa Turns out that there are other django settings we should really switch off on a live deployment, so I didn't use the dj-database-url approach. I used one of the recommended deployments from the django book. Still - we now have a postgres and sqlite install.

Lukasa commented 11 years ago

So the Tornado server now has a connection to the Sqlite DB. Next up is to abstract this further and add Postgres support.

peterbrittain commented 11 years ago

Excellent news! On a related note, I've just managed to install a CentOS 6 VM on Amazon EC2. This is now running the development server. You can all have a look at http://ec2-54-229-142-181.eu-west-1.compute.amazonaws.com:8000/conga/ (credentials are currently staff/staff). Once logged in, you should be able to see the results of a client (e.g. our test script) creating users/congas.

Lukasa commented 11 years ago

So as of last night the Tornado server is "fully functional".

The scare quotes there indicate that this isn't entirely true. Clients closing their connections unexpectedly causes real problems at the minute: we just raise exceptions rather than doing the right thing, which would be to log and drop them from the conga. As a double whammy, if we discover this while sending a conga message to that participant, we'll drop the conga message on the floor, which is lousy. Also note that "closng their connections unexpectedly" can include closing it after sending a BYE but before we've processed it, due to the way Tornado handles reading from asynchronous sockets.

We're also currently missing Postgres support. Not a hugely expensive job to add, but worth noting.

Finally, there's a relative dearth of proper error handling. We'll need to wrap almost every callback in a try...except block that catches all exceptions, logs and then drops the conga participant out of the conga, just so we don't die when something bad happens.

peterbrittain commented 11 years ago

Great stuff! I've been spending a little time getting the wiki development docs into something more like a sensible order starting from an obvious home page when you click on the wiki link...

Does this mean that you could now write up a simple explanation of the Tornado server in the server design?

Lukasa commented 11 years ago

In principle, yes. I want to finish up some of the logic before I do that though: better not to have to write the documentation twice. =)

Lukasa commented 11 years ago

Further updates: error handling is vastly improved, and we log a ton of stuff out now when things go wrong. Right now it's all 'error level' logging, but I might down-level some of it because most of it is recoverable.

peterbrittain commented 10 years ago

OK - I've had a try and it handles the obvious error conditions now, so I reckon we have something that we can use for initial release, so I'm going to close this one out.

neilcollins / piconga

Concurrency model #11