Open foolip opened 8 years ago
@domenic?
Yeah, I've been meaning to add HTTPS for a while, especially since AWS apparently makes that easy now.
Another thing one could do is to assume that the build server has a copy of the html repo, and only send a diff against the merge-base with whatwg/html. A bit elaborate, of course, and the next bottleneck would be sending the output back.
Could we use dropbox for this, or some similar service? https://www.dropbox.com/help/8
Unfortunately, as tested with Wireshark in https://github.com/whatwg/html-build/pull/64#issuecomment-179040279 there is no automatic compression at the TLS layer.
So, just to break this down and see where the low hanging fruit is, here's what we send:
source-whatwg-complete
is 6055 kB and gzips to 1158 kBcaniuse.json
is 980 kB and gzips to 135 kBw3cbugs.csv
is 245 kB and gzip to 96 kBThe returned wattsi-output.zip
is 4218 kB and is already compressed using Defl:N (so says unzip -lv
), which is the same algorithm that gzip uses.
Potential compression in total is from ~7.2 MB to ~1.4 MB, and we only need to fiddle with the posted data.
As for the imagined automatic compression of TLS, a colleague has educated me, and while it's part of the protocol it's been turned off in browsers because of the security issues, "adaptive chosen plaintext attacks." Even if we can get it to work with curl, it's probably not a long-term safe bet.
So, @domenic, do you think you could add support for a .gz
variant of each field? Unless someone knows of a way to get the whole request body compressed at the HTTP level, since TLS is the wrong level. Maybe HTTP2 can do it?
The TLS spec does support compression, but is considered bad as it opens up for attacks. Compression attacks belong in "adaptive chosen plaintext" category though, which means you need to somehow control part of what the agent sends. This is hard to conduct, and mostly browsers are vulnerable. However, since it's considered harmful I wouldn't use it, as it is or will most likely be turned off in whatever TLS stack you are using.
@haavardmolland informs me that HTTP2 compresses (almost) everything, so that would be an option. curl has a --http2
option that presumably works at least some of the time. @domenic, does your build server know how to speak HTTP2?
I would love to make my build server http2 aware. I will try to figure that out over the next day or two.
I've noticed that even though my curl binary has a --http2
option, it actually fails with curl: (1) Unsupported protocol
when I try to use it. The curl binary from Homebrew has the same problem.
This would limit compression to very bleeding edge curl installs, if it's ever enabled by default. Not sure how to deal with this, seems like compressing ourselves would be the only way to benefit most users of html-build :(
Are you sure it's not just failing because my server doesn't support that protocol?
I'm pretty sure, yes, because it fails fast and running with verbose output it doesn't seem to even try connecting. curl -V
also doesn't list it as a supported protocol:
curl 7.43.0 (x86_64-apple-darwin15.0) libcurl/7.43.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
Some relevant web search finds: https://curl.haxx.se/docs/http2.html https://github.com/Homebrew/homebrew/pull/36942
OK. That is very sad. But I can work on adding support for a zip body instead of a multipart body then; simple enough. I will probably do switching on Content-Type first (application/zip => zip path) and then after a few weeks remove support for non-zip. Alternately I could add a new endpoint (/v2/wattsi or /wattsi-zipped or similar) but it's probably not worth worrying about at this point.
I do wish we had whatwg/html-build#53 in place though.
Just uploading the source accounts for the majority of the build time when you're on a slow network.
https://tools.ietf.org/html/rfc2388#section-5.1 says "do it yourself", so I see two options:
sourcegz
field or something that takes the compressed source instead.I think HTTPS makes sense, for other obvious reasons as well.