Closed peterbrittain closed 11 years ago
I agree on some points and disagree on others.
About HTTPS, I disagree: I believe HTTPS is obligatory. I really can't be more emphatic about this. The performance hit is just not a problem anymore. In particular, we should be able to avoid problems caused by lots of SSL handshakes because Requests uses connection pooling.
We don't need to HTTPS everything: restricting ourselves to HTTPSing the Web UI and initial Pi registration is potentially enough, we can use session tokens after that point.
Some other thoughts:
Fair enough. If our server can cope with the SSL, I'd be much happier using it. For some reason, though, I thought we might be running the server on a Pi and so might have an issue here. For now, let's assume we use SSL until it becomes an issue.
We might be able to do something more cunning on a DoS attack, but for now, I'll assume that it's a matter of getting another server up and running as quickly as possible. Let's get tha basics sorted before adding bells and whistles like white-listing...
I've just got my first Pi installed and tried installing our server code on it. Good news is that it all just worked out of the box (wiki has been updated to explain how to do it). The bad news is that I completely blow all the CPU on the Pi trying to run the server with just my simple test script and no SSL.
Looks like we'll either need to back off the SSL everywhere idea and be more selective on when we use it, or ditch the original idea of running the server on RPis too.
Yeah, my strong view on using SSL was predicated on the idea that we'd use a non-Pi server. If we do then SSL is way less likely to work.
Actually, as a further thought, using a Raspberry Pi for a server is just almost certainly going to go poorly. I don't think we can expect one Raspberry Pi to take something in the region of 300-700 times the load of each of the spoke Pis: at least, not if we're writing the server in Python.
If we were really determined to use Raspberry Pis everywhere we could use a cluster of Pis as a 'server', which is an interesting engineering challenge, but not necessarily a good decision.
I think Neil's idea was that you could use the server as a local hub and so it would be servicing far fewer requests. Based on what I'm seeing, I'm not convinced it would be capable of handling even 10 clients in that case (even without SSL).
That said, we have yet to set up a production server, so use of Apache + postgresql (or favoured Pi equivalents) might make the difference for such small scale deployments.
Fortunately, we have already proved that we can run the server without any issue on pretty much any Windows or Linux PC, so maybe the teacher installs that on their PC?
Wait, were you using the Django builtin server for this testing?
Yup - and I knew it will be slower as a result... But this was REALLY slow.
With effort, we may be able to get it working for very small deployments, but it is clearly not going to work in all cases. Hence why I've re-opened this issue.
I think the resolution to this trail is to set up a proper deployment server and then seeing how far it can actually cope with real requests. Volunteers anyone?
Django's development server isn't just slow, it's single-threaded. That probably accounts for part of your problem. I've got some experience setting up Django with Gunicorn, so I can give that a shot.
My client was single-threaded too - it just issued one request after the other and spent up to ~10-20 seconds waiting for some to return before issuing the next request. The whole time the request was being processed, the Pi was running at 100% CPU.
So I've put Gunicorn in front of the Django test app and run our test script against it from my Windows box. I also updated the test script to print out the time each request took (change checked in). Same LAN, so almost all the request time here is on the Pi. Highlights are:
The cost here is almost certainly going to the SD card. Disk access is always going to be a pain, and the SD card is JUST SO SLOW. I might try putting a networked database behind the app to see if it goes faster.
Could be that sqlite is not caching requests to access its DB file... It's certainly worth setting up a real DB server at some point.
However, I don't see the benefit of running it on a separate server from the Pi, though, coz that effectively means you have to set up that separate server for the DB. And if we're going to do that, we may as well insist that the web server goes there too.
Yeah, I'm inclined to blame sqlite. At @lwr20's suggestion I moved the db file into a ramdisk on the Pi. This removed some of the variance in request times, and dropped the time of anything reading the DB to basically zero. Registration still took 7 seconds, re-registration took 3, and de-registration took 3. Everything else took less than a second.
At this stage it looks like sqlite is to blame here, though I'm not used to it being that slow.
Oh, maybe not. Watching the output of 'top' the gunicorn process was maxed out. We might need to profile the django code to see what's going on.
The one real difference between user registration and the rest of the API is that it authenticates the user. I bet that's using a monstrously intensive encryption and so we need to pick a less intensive algorithm.
So with some testing done here that appears to be correct. I've switched the Django config to prefer SHA1 which vastly improves our timings.
OK - let's close this one down again now. We can run a very small network on a Pi if we have to, but expect the public network to use a cloud server for the required time.
I was having a play at setting up the central server last night and have created a simple django app that:
So far, so good: we can have administrators of the service (staff) and normal users, each with their own dedicated interface (HTML/GUI or JSON).
The problem I'm hitting is the question of what is secure enough for something that we propose to allow schools to use and yet will be running "in the wild" on the Internet.
Some simple questions for what we need to consider:
My gut reaction is that we can't make a super secure service without a lot of effort and so probably should accept some limitations. It's probably enough to make this system resilient to mild scrutiny. In particular, I'd propose:
Make sense?