mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.76k stars 545 forks source link

Support for build servers not behind `https` #930

Open daboross opened 3 years ago

daboross commented 3 years ago

It seems like sccache hardcodes https for build servers:

https://github.com/mozilla/sccache/blob/46d51c1404684e5c051ed0cec8ff96410eff806f/src/dist/http.rs#L250

Could this be made configurable, or otherwise documented? The distributed setup guides mentions that it's recommended to put the scheduler behind a HTTPs server:

It's strongly recommended to listen on localhost and put a HTTPS server in front of it.

However, I can't find anything mentioning a hard requirement to put builders behind the same.

My use case for this is using sccache on a local LAN network. Right now I'm trying to set it up just using SSH tunnels - the scheduler and builder only to localhost, and I connect to server through an SSH tunnel. This setup fails with the following error, though:

 WARN 2021-01-11T04:00:37Z: sccache::compiler::compiler: [inflections]: Could not perform distributed compile, falling back to local:
Error 500: {"description":"assign job failed, job un-assigned from the server",
"cause":{"description":"POST to scheduler assign_job failed",
"cause":{"description":"https://[::1]:10501/api/v1/distserver/assign_job/43: error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913: (Hostname mismatch)",
"cause":{"description":"error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913: (Hostname mismatch)",
"cause":{"description":"error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913:",
"cause":null}}}}}:
Error 500: {"description":"assign job failed, job un-assigned from the server",
"cause":{"description":"POST to scheduler assign_job failed",
"cause":{"description":"https://[::1]:10501/api/v1/distserver/assign_job/43: error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913: (Hostname mismatch)",
"cause":{"description":"error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913: (Hostname mismatch)",
"cause":{"description":"error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1913:",
"cause":null}}}}}
sccache-scheduler.conf ```ini public_addr = "[::1]:10600" [client_auth] type = "token" token = "~~~" [server_auth] type = "jwt_hs256" secret_key = "~~~" ```
sccache-server.conf ```ini # This is where client toolchains will be stored. cache_dir = "/home/daboross/sccache-config/server_cache" # The maximum size of the toolchain cache, in bytes. # If unspecified the default is 10GB. # toolchain_cache_size = 10737418240 # A public IP address and port that clients will use to connect to this builder. public_addr = "[::1]:10501" # The URL used to connect to the scheduler (should use https, given an ideal # setup of a HTTPS server in front of the scheduler) scheduler_url = "http://[::1]:10600" [builder] type = "overlay" # The directory under which a sandboxed filesystem will be created for builds. build_dir = "/tmp/build" # The path to the bubblewrap version 0.3.0+ `bwrap` binary. bwrap_path = "/usr/bin/bwrap" [scheduler_auth] type = "jwt_token" token = "~~~" ```
client sccache/config ```ini [dist] scheduler_url = "http://[::1]:10600" toolchains = [] toolchain_cache_size = 5368709120 [dist.auth] type = "token" token = "~~~" ```
torokati44 commented 3 years ago

+1

rahulbansal16 commented 3 years ago

How are you able to assign the HTTPS to the IP address? I am trying to setup up the build server. I set up the Nginx to proxy the result of the port 80 to the 10501 and HTTPS for the build server and my build server listens on the 127.0.0.1:10501 port.

From my scheduler, I do a local port forwarding of 5501 to my buildServerPort. My scheduler fails to connect to the build server because of SSL issue.

daboross commented 3 years ago

My scheduler fails to connect to the build server because of SSL issue.

I think you probably need to do some configuration to accept your certificate? Maybe install it (or a root cert granting it) as a locally trusted root on the scheduler's system. I'm unsure how to do this (and it probably depends on your system), but I'm 90% sure it's what one needs to do.

You'd want the certificate to be for localhost, I think, and then you'd set the public ip address to localhost. I've looked at, for instance, https://letsencrypt.org/docs/certificates-for-localhost/#making-and-trusting-your-own-certificates, for generating these certs.

GoldsteinE commented 2 years ago

My build servers are on local IPs in a VPN, so I’m also very interested in a way to disable HTTPS for build server connections. Would a PR adding an option like require_builder_https = false in scheduler and/or build server and/or client config be accepted?

GoldsteinE commented 2 years ago

Actually, I don’t understand how it supposed to work at all. Given that:

  1. Build server doesn’t support HTTPS
  2. Build server binds its own IP and client gets the same public IP, so it's not possible to inject reverse proxy in the middle

it seems impossible to configure distributed builds at all without some kind of IP trickery (making the same IP mean different things on the client and on the server, so client connects to reverse proxy).

It seems like HTTPS for build servers was introduced by this commit: f6ad408ef92e950b6ae727753a460108a3ddaf86. @aidanhs, if you have some time, could you please explain how to properly configure HTTPS for build servers?

aidanhs commented 2 years ago

Build servers always use https and are self-configured, you don't need to (and should not!) try and configure them. You should treat it as an opaque protocol and expose the build server port directly on whatever network you're using, don't put it behind nginx.

The way that this works is:

  1. build servers dynamically generate an ssl certificate on startup https://github.com/mozilla/sccache/blob/963f137c8a2847a05cdc32062d9a3579e36aeb69/src/dist/http.rs#L303
  2. the public part of certificate is provided to the scheduler when the builder registers itself via a heartbeat https://github.com/mozilla/sccache/blob/963f137c8a2847a05cdc32062d9a3579e36aeb69/src/dist/http.rs#L920-L958
  3. clients, when their request is assigned to a build server, retrieve the public part of the certificate for that build server and use it in the request to the server
  4. all build requests are therefore https encrypted on a per-build-server basis

The problem in the original post is interesting (and the error message is misleading, sorry). The problem seems to be that the scheduler is failing to talk to the build server because of a hostname mismatch in the certificate - but I can't see anything obviously wrong in the configs. I can't help but wonder if this is related to the use of ipv6 addresses.

A motivated contributor looking for a fix might look carefully at the uses of the addr variable in https://github.com/mozilla/sccache/blob/963f137c8a2847a05cdc32062d9a3579e36aeb69/src/dist/http.rs#L303, as that's where the hostname information is being derived from to embed in the certificates.

I'm not sure any of the follow-up comments in this issue actually relate to the original problem.

GoldsteinE commented 2 years ago

To add a data point: I get the same error with IPv4.

m00nwtchr commented 2 years ago

Note: With reverse proxies like Traefik, it is possible to forward the TLS without terminating it at the proxy, so support for this would make everything a lot easier. https://doc.traefik.io/traefik/routing/routers/#passthrough Also the public_addr also being the bind address makes it impossible or almost impossible to run in Docker, afaict. EDIT: Alternatively, could the scheduler act as a proxy for the build servers? (instead of needing a direct connection for the clients)

aidanhs commented 2 years ago

There's certainly an opportunity to separate out the public_address and bind_address, which would then let you put something in front. It'd also make it easier to get connectivity into docker (which currently I think you can only do with --net=host).

I would advise against any patch to use the scheduler as a proxy - it'd be too easy to enable and then not realise that it's become a bottleneck later.

@GoldsteinE do you have the set of configs hanging around that let you observe the error with IPv4? I could see if I can reproduce.