matrix-org / complement

Matrix compliance test suite
Apache License 2.0
61 stars 50 forks source link

Consider removing COMPLEMENT_CA and having it always on #65

Closed anoadragon453 closed 2 years ago

anoadragon453 commented 3 years ago

Before too many more homeserver dockerfiles are written, what do people think about removing the COMPLEMENT_CA environment variable, and making its behaviour the default? /ca will just be mounted in all docker containers, and homeserverd dockerfiles can choose to use the certificate and key in /ca or not.

Existing dockerfiles, such as the Synapse Monolith one will need to be updated, but I've been planning to adjust that one to use Complement's CA anyhow.

kegsay commented 3 years ago

It adds complexity to have it on which is why I wanted it to be opt-in originally. For now, my position remains the same.

richvdh commented 3 years ago

Existing dockerfiles, such as the Synapse Monolith one will need to be updated,

does it really though? Looks like it will just work; it'll just use the complement CA instead of its own.

in any case, +1. I feel like there should be one way to do this.

richvdh commented 3 years ago

It adds complexity to have it on

how so?

(@anoadragon453: what is the benefit of this approach over using a static CA cert as the monolith image does?)

kegsay commented 3 years ago

how so?

Complement needs to generate the cert and add a volume to every docker container it runs. There's some magic to determine where to look/query depending on if Complement itself is running in docker https://github.com/matrix-org/complement/blob/master/internal/docker/builder.go#L350

The alternative is no volumes, and skipping TLS checks.

richvdh commented 3 years ago

well, we've talked previously about how the magic CI flag is... magic.

I'm still failing to grasp why setting up a CA is an optional feature. Either it's required to run complement's tests, so should be enabled by default; or it's not required, in which case we can just get rid of it?

anoadragon453 commented 3 years ago

(@anoadragon453: what is the benefit of this approach over using a static CA cert as the monolith image does?)

Well, I was hoping it would allow us to set federation_verify_certificates: true in Synapse's config, although that didn't end up being the case: https://github.com/matrix-org/complement/blob/40d2a8d2a96f89e922a436336f068573d6472ddb/dockerfiles/synapse/workers-shared.yaml#L8-L11

anoadragon453 commented 3 years ago

Well the above is likely the case because I didn't follow the instructions on using the certificate correctly.

Regardless, the intention behind enabling it for Synapse runs is so that we can test certificate validation in the homeserver. It's a small thing, but nice to give homeservers the option to decide whether they want to do testing with certificate validation enabled or disabled.

what is the benefit of this approach over using a static CA cert as the monolith image does?

Complement doesn't attempt to verify certificates upon connecting to a homeserver:

https://github.com/matrix-org/complement/blob/665c1dd08b83010cdf99810714d1be5fc115ba94/internal/docker/deployer.go#L144-L150

So there's not really much point in using a separate certificate for Synapse versus creating one derived from the provided Complement CA cert.

The only benefit that COMPLEMENT_CA=1 grants us is allowing us to verify Complement's self-signed certificate when making federation requests, which I think is useful. As for why we shouldn't have this option always on by default, there were some concerns about performance, but I'm not sure if that was ever tested?

anoadragon453 commented 3 years ago

I've got Synapse verifying Complement's certificates after some fixes to Complement :tada:

I'd like to get a conclusion on this issue now though. Unless performance is a concern, my personal preference is just to go for it being always on.

kegsay commented 3 years ago

Sure, let's go with it always being on then. I'll need to tweak the Dendrite image to make certs from /ca then.

anoadragon453 commented 3 years ago

@Kegsay Cool, thanks. I'll get #69 updated and in which should fix the issues with the current implementation, and change the Synapse monolith image to use /ca as well.

kegsay commented 3 years ago

So my fears about magic stuff were well-founded given anoa's recent woes running Complement with a local buildkite runner:

    apidoc_register_test.go:31: Deploy: Failed to construct blueprint: Could not identify container ID using /proc/1/cpuset

This is a Complement-specific error, and due to using ComplementCA=true, which the SynapseWorker image requires.

I don't want to enable this by default until we figure out the root cause here.

valkum commented 3 years ago

I think we identified the problem in docker not populating the cpuset correctly. We were not able to reproduce this problem. I am not sure if @anoadragon453 was able to follow up in this though. Even for the case of explicitly setting ComplementCA=true, we could check the cpuset at start and disable ComplementCA and printing an error message, to avoid failing hard.

anoadragon453 commented 3 years ago

At the moment I still don't know what causes cpuset to not be populated - all I know is that it's not on my Arch Linux machines, but does work on my Ubuntu machine. So this might either be:

I haven't done a proper investigation yet, but I just confirmed that it's still an issue with the latest docker version on Arch Linux (20.10.6, build 370c28948e).

kegsay commented 2 years ago

I'm biting the bullet here and turning this on by default for the following reasons: