tkhq / qos

QuorumOS is a computation layer for running applications inside Trusted Execution Environments (TEEs)
GNU Affero General Public License v3.0
5 stars 2 forks source link

Lance/repros support #486

Open lrvick opened 1 month ago

lrvick commented 1 month ago

Summary & Motivation (Problem vs. Solution)

How I Tested These Changes

Pre merge check list

cr-tk commented 1 month ago

Hi @lrvick , I have a series of questions and comments on this.

First, is this a branch that's still in flux (= draft PR) or a full PR that's ready for comments and review?

I'm not aware of an engineering ticket on Linear that's linked to this (not totally surprising based on recent conversations). However, the PR itself also has no code comments, notes on non-trivial/potentially unexpected parts, or a high-level summary. As such, I think it's under-documented, which makes it hard to give good feedback on the PR or do the (eventual) code review.

The primary reason I'm mentioning this is are the changes to .github/workflows/artifacts.yml, which tingle my security sense :spider: :slightly_smiling_face:

What is this 144.76.154.76 host and git repository that the ssh-based git checkout access pushes to? Reverse lookups and whois information suggest this is a physical rented server at Hetzner in Germany. I wasn't aware Turnkey has any machines or Git repository there.

The StrictHostKeyChecking no ssh config option intentionally weakens security for the ssh connection, which allows MITM network attackers to impersonate the target server, and (likely) read pushed data or cause other impacts that are worse. AFAIK ssh-keyscan -H ${{matrix.host}} >> ~/.ssh/known_hosts does not prevent this - it pre-fills the known_hosts file with what the check sees at runtime (which may already be malicious), but since strict host key checking is off, it won't reliably stop MITM attacks even if they happen between this command and the following git push repros-lance HEAD step.

I expect StrictHostKeyChecking accept-new to be a slightly better variant of this, since it will automatically trust the presented hostkey on first use (TOFU principle) without the user interaction of the ask setting but strongly enforce host key identity afterwards. It's not good enough, though.

As outlined, I think TOFU isn't a good choice here, since the TOFU check and the usage are so close to each other and will repeat on every GitHub action run. Instead, I highly recommend configuring the runner with the pre-determined fixed expected host keys, namely:

ssh-keyscan -H 144.76.154.76
# 144.76.154.76:22 SSH-2.0-OpenSSH_9.8
|1|+VwPWywRZn1LgL2cDFGbtuK5GR8=|/7HRKIjUntxCUuollX5hmlcLORA= ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCtuvpTCcfre5CyiuaiY+/6UTpp9uCUgfidGgFfCLWA1Xoy4JB7wB2uFj6atmWxsL5cPp3sLQ9CAHEo+RPIpb+C1IXyJ/yLx1rgHA+3DVSuZdwmchk4ufZ066HTAcXZQYTxInlUaT8Guza3KtZfYow4YzJu7w99lKF4AqRYJMBhVHDeR1XildX/i1jwGHM1ha1p8BDxHQDxKbUhV9TJ0EFDTofXoxhBqXkW5LWcKBKfDF6OmelddUlarMYeGFsy+4ZbU1tm3xLiFFdNRP0b5kzRABH/uXkDso+VhSPGAe39b99dJyDb+EXHXsTMxE0YxLsPfmaWNdYZ2odg9qscIIyb1W/sezKfFPEshPF6Qd5Iav68sFm0Nr3k9cbwSMSYQXyJVVUmcrHIItmx7oyHSR3AeZNJemHRP/T3DwccNpzC/EiAv81FSwsuZl0TwKQAJob4PaS4WPrHdS8zhaAhzKFarTWvj6x4mLhI3wyuwbnZg2z/wpdvVHf07j3UvogdI4s=
# 144.76.154.76:22 SSH-2.0-OpenSSH_9.8
|1|3eCUr8YmgaD6U/cvZvYS/U11mfI=|COD/H9+MM4oi//WrS0v35KW+48k= ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBDl2cxE8PADyJoFfYStCdp9X3OOQN6zmmrsMktxQsclpBMUwy2n3xhQhfvClbP40ZbEbeyeiPHIqU9lF+S/ldq4=
# 144.76.154.76:22 SSH-2.0-OpenSSH_9.8
|1|VCPdbrd4ytMPF4IJ5bPbnWMaOCc=|zjhcSjM2986XPRA3BmAccYB9LGw= ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHuwf76zC66gC+3T/9XMvju3lcE4P0o5Ngxo5ZK6QSgS
# 144.76.154.76:22 SSH-2.0-OpenSSH_9.8
# 144.76.154.76:22 SSH-2.0-OpenSSH_9.8

(Side note, the benefit from the -H = hashed behavior is probably marginal, since the target IP address is present in multiple other places in the job / on the runner, but it doesn't really hurt to keep the flag).

One practical scenario where the current definition would hurt us is if we cancel the Hetzner server, someone else gets authoritative control over the IP after it gets reassigned to another customer, and we forget about cancelling our runner job in time. Then they could setup a kind of ssh honeypot that accepts the incoming ssh + git requests and cause problems. This attack would not have to be Turnkey-specific or even targeted at us.

cr-tk commented 1 month ago

On a related note, it would be helpful for https://github.com/tkhq/repros-sigs to have a README.md explaining the contents, and a short GitHub description.