Open zenhack opened 7 years ago
I also see the same problem on my new sandstorm installation. I would be happy to provide any diagnostic information that could help understand the problem.
In an ad hoc test I did just now, a new YATA instance starts up in ~1.7 seconds on Oasis and ~0.5 seconds running locally. I can't seem to reproduce the issue @zenhack describes.
That said, it's definitely true that many Sandstorm apps start up pretty slowly. But this isn't because the server is running any slower. Sandstorm's approach to sandboxing has almost zero overhead in terms of server performance.
Typically, the problem is some combination of:
Quoting Kenton Varda (2017-08-26 18:53:10)
Typically, the problem is some combination of:
Yeah, that's its own issue -- I referenced YATA because it's a good baseline for removing this kind of overhead from the equation. I did some local measurements of the app's startup time outside of the sandbox:
[isd@rook yata]$ export DB_PATH=my-db.sqlite3
[isd@rook yata]$ cat waitstart.sh
while true; do
curl http://localhost:8080 > /dev/null 2>/dev/null
[ "$?" = 0 ] && exit 0
done
[isd@rook yata]$ ./app & time ./waitstart.sh
[1] 2728
real 0m0,184s
user 0m0,095s
sys 0m0,046s
[isd@rook yata]$ fg
./app
^C
[isd@rook yata]$ ./app & time ./waitstart.sh
[1] 2756
real 0m0,022s
user 0m0,009s
sys 0m0,007s
[isd@rook yata]$ fg
./app
^C
The first run is with no pre-existing database, so it's a bit slower. But I see the same slowdown on sandstorm regardless of whether it's a first boot or opening an existing grain. At 20 ms to being ready to serve a page, I shouldn't be able to perceive anything -- it can't be the app.
The above is running locally on my laptop, and the delay running in sandstorm (also on my laptop) is about 1 second (as opposed to 2-3 on my server in the other room). But that's still a 50x slowdown in the sandbox vs. out of it.
I doubt the sandboxing itself is what's causing the problem, but it seems unlikely that everything can be blamed on the apps. I will try to find some time this week to dig in and see what's going on.
My previous measurements were based on holding a stopwatch, so they accounted for human cognition delay.
If I look just at the Chrome devtools network panel, I get time-to-fist-byte of ~230ms for a new YATA grain, ~130ms for an existing grain. The latter seems to be independent of whether the grain was already running.
On Oasis, I'm seeing an existing grain TTFB is 450ms, and a new grain is 625ms (assuming the app is cached on the workers -- pulling from cold storage can add a second or two). About 200ms-300ms of this is DNS + TCP + TLS for the newly-created subdomain. Meanwhile there are three other network round-trips needed on grain load, and my RTT to Oasis is 60ms. So in this case the time is almost entirely explained by network round trips. Conceivably we could find ways to eliminate a round trip or two.
Now I think I have an explanation for your observations on your local server: where is your DNS? My guess is that when you're seeing a multi-second startup time for YATA, it's almost entirely DNS lookup time, and your DNS is remote. I have a local DNS server for my local Sandstorm instance so it's roughly instantaneous for me.
In any case, I think we might be looking at maybe 100ms of Sandstorm bookkeeping overhead (maybe some Mongo queries, etc.), which parallelizes with three network round trips. We could probably reduce either of those numbers a bit with some optimizations. But I don't think this is the real problem with Sandstorm app startup times. If every app started as fast as YATA I think everyone would be very happy. The real problem is the multi-second startups of more bloated apps.
The dns issue had occurred to me; setting up a local one and comparing is on my todo list. I'm using sandcats for dns. dig tells me I'm getting response times of < 50 ms, so I'm skeptical, but I'll sit down and test soonish.
Okay, yeah, setting up dnsmasq on my machine and having it handle requests for the sandstorm box's domain speeds things up substantially. 200ms for the local system, around 1 second for the machine in the other room (measured via the firefox dev tools). The latter still seems longer than it ought to be given that we're talking about wifi to a machine in the next room, but it's at least well within the not-annoying range.
The motivator for this though is actually my phone on LTE, which takes much longer, even when the signal is such that loading times for e.g. zenhack.net are still imperceptable. I can't convienently set up a custom DNS resolver on my phone to handle things locally, (a) because sandcats dns is dynamic, so it would break when my IP changed, and (b) just because doing that on a phone is a bit annoying (though I could figure something out if it were critical).
A few seconds on top of sandstorm itself is enough to make me think twice about bothering to open up my phone to jot down a todo item (half the reason I wrote YATA was that simple TODOs was even worse, and it is a big improvement).
It occurs to me that using per-session per-grain domains is going to defeat DNS caching, at least if we're just responding per-domain. I have heard that there are some significant problems re: compatiblity with wildcard domains, but I don't know just how bad they are/how widely supported they are anyway. One thought is to get sandcats to supply a wildcard record to the DNS client.
AIUI, there's actually no such thing as a "wildcard record" in the DNS protocol. Rather, configuring a wildcard causes the server to respond to all matching requests in the same way. It's entirely up to the server to implement the matching.
There IS such a thing as wildcard TLS certificates, but that's a different matter.
We could "fix" the slow-DNS problem by "pre-allocating" hosts: When you open Sandstorm, it could randomly-generate some hostnames client-side and fire off dummy requests to them, to force DNS lookup and even TLS negotiation to complete. Then upon opening a session, the client could request that the server assign a particular hostname. I think it's fine, security-wise, to allow this -- a client who chooses a non-random hostname would only be hurting themselves.
This was just asked about on IRC, and I noticed it before too: launching grains is unreasonably slow. For concrete numbers: my janky todo app (https://github.com/zenhack/yata), when run outside of the sandbox, starts instantaniously. In contrast, clicking on it in the grain list, while sitting about 15 feet from the server on my laptop (on wifi) takes 2-3 seconds before the UI appears. I remember there being some discussion about this on the mailing list wrt davros way back:
https://groups.google.com/forum/#!searchin/sandstorm-dev/davros$20startup|sort:relevance/sandstorm-dev/-mncsxPR7Rg/o3DHo_ynAgAJ
At the time we were working under the assumption that davros was at fault, but I suspect that is not the case, given that startup times outside the sandbox are dramatically faster. In the case of the app I linked above, it's basically just opening a sqlite database and then listening on a port; this takes almost no time outside the sandbox, and it seems unreasonable that it should take seconds within.
I've also noticed that it does get worse on worse internet connections, I think disproportionately to the decrease in overall network performance (but I'd have to do more careful measurements to be sure).
For reference, here is the discussion on IRC:
https://botbot.me/freenode/sandstorm/2017-08-18/?msg=90006029&page=1