whatwg / build.whatwg.org

Build server for running whatwg/wattsi
Creative Commons Zero v1.0 Universal
11 stars 12 forks source link

Compress the source when posting to build server #85

Open foolip opened 8 years ago

foolip commented 8 years ago

Just uploading the source accounts for the majority of the build time when you're on a slow network.

https://tools.ietf.org/html/rfc2388#section-5.1 says "do it yourself", so I see two options:

  1. Add a sourcegz field or something that takes the compressed source instead.
  2. Make the build server use HTTPS, and verify that the negotiated connection uses compression.

I think HTTPS makes sense, for other obvious reasons as well.

foolip commented 8 years ago

@domenic?

domenic commented 8 years ago

Yeah, I've been meaning to add HTTPS for a while, especially since AWS apparently makes that easy now.

foolip commented 8 years ago

Another thing one could do is to assume that the build server has a copy of the html repo, and only send a diff against the merge-base with whatwg/html. A bit elaborate, of course, and the next bottleneck would be sending the output back.

zcorpan commented 8 years ago

Could we use dropbox for this, or some similar service? https://www.dropbox.com/help/8

foolip commented 8 years ago

Unfortunately, as tested with Wireshark in https://github.com/whatwg/html-build/pull/64#issuecomment-179040279 there is no automatic compression at the TLS layer.

foolip commented 8 years ago

So, just to break this down and see where the low hanging fruit is, here's what we send:

The returned wattsi-output.zip is 4218 kB and is already compressed using Defl:N (so says unzip -lv), which is the same algorithm that gzip uses.

Potential compression in total is from ~7.2 MB to ~1.4 MB, and we only need to fiddle with the posted data.

foolip commented 8 years ago

As for the imagined automatic compression of TLS, a colleague has educated me, and while it's part of the protocol it's been turned off in browsers because of the security issues, "adaptive chosen plaintext attacks." Even if we can get it to work with curl, it's probably not a long-term safe bet.

So, @domenic, do you think you could add support for a .gz variant of each field? Unless someone knows of a way to get the whole request body compressed at the HTTP level, since TLS is the wrong level. Maybe HTTP2 can do it?

haavardmolland commented 8 years ago

The TLS spec does support compression, but is considered bad as it opens up for attacks. Compression attacks belong in "adaptive chosen plaintext" category though, which means you need to somehow control part of what the agent sends. This is hard to conduct, and mostly browsers are vulnerable. However, since it's considered harmful I wouldn't use it, as it is or will most likely be turned off in whatever TLS stack you are using.

foolip commented 8 years ago

@haavardmolland informs me that HTTP2 compresses (almost) everything, so that would be an option. curl has a --http2 option that presumably works at least some of the time. @domenic, does your build server know how to speak HTTP2?

domenic commented 8 years ago

I would love to make my build server http2 aware. I will try to figure that out over the next day or two.

foolip commented 8 years ago

I've noticed that even though my curl binary has a --http2 option, it actually fails with curl: (1) Unsupported protocol when I try to use it. The curl binary from Homebrew has the same problem.

This would limit compression to very bleeding edge curl installs, if it's ever enabled by default. Not sure how to deal with this, seems like compressing ourselves would be the only way to benefit most users of html-build :(

domenic commented 8 years ago

Are you sure it's not just failing because my server doesn't support that protocol?

foolip commented 8 years ago

I'm pretty sure, yes, because it fails fast and running with verbose output it doesn't seem to even try connecting. curl -V also doesn't list it as a supported protocol:

curl 7.43.0 (x86_64-apple-darwin15.0) libcurl/7.43.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets

Some relevant web search finds: https://curl.haxx.se/docs/http2.html https://github.com/Homebrew/homebrew/pull/36942

domenic commented 8 years ago

OK. That is very sad. But I can work on adding support for a zip body instead of a multipart body then; simple enough. I will probably do switching on Content-Type first (application/zip => zip path) and then after a few weeks remove support for non-zip. Alternately I could add a new endpoint (/v2/wattsi or /wattsi-zipped or similar) but it's probably not worth worrying about at this point.

I do wish we had whatwg/html-build#53 in place though.