hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.06k stars 3.32k forks source link

Packer SSH Communicator Fails to Iterate Default KEX-algo list #12917

Open ferricoxide opened 5 months ago

ferricoxide commented 5 months ago

Community Note

Overview of the Issue

When using the ssh-communicator to provision a FIPS-enabled target, the SSH communicator hangs and eventually times out. If one logs into the target systems and reviews the system logs, one finds errors about:

Apr  9 18:52:43 ip-172-31-47-57 sshd[1460]: input_kex_gen_init: Key exchange type c25519 is not allowed in FIPS mode [preauth]
Apr  9 18:52:43 ip-172-31-47-57 sshd[1460]: ssh_dispatch_run_fatal: Connection from 217.114.38.123 port 51540: invalid argument [preauth]

In /var/log/secure.

The problem may be worked around by using the ssh_key_exchange_algorithms parameter to specify an algorithm-list that omits curve25519-sha256@libssh.org. However, this seems like it shouldn't be necessary. Since the documentation indicates that there's already a list of algorithms to try, Packer notionally should attempt to iteratively renegotiate the connection to use one of the other ones in the list. This seems to not be the actual behavior.

Would request that any of:

Be implemented.

While I did notice there were other communicator issues around FIPS-enabled systems, the nature/focus of those tickets seemed to be different.

Reproduction Steps

Steps to reproduce this issue

  1. Run a packer-job that attempts to provision a FIPS-enabled target
  2. Wait for Packer to hang while attempting to connect with the SSH communicator
  3. Login to the target and view the /var/log/secure file: messages similar to the above will be found

Packer version

1.8.7 (yes, I know that this is elderly but a few of our older job-defs won't work with newer versions: we're planning to remove them when Red Hat deprecates RHEL 7 in early summer)

Simplified Packer Template

If the file is longer than a few dozen lines, please include the URL to the gist of the log or use the Github detailed format instead of posting it directly in the issue.

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

Packer-executor host(s):

Packer target(s):

Log Fragments and crash.log files

2024-04-09T07:00:34.812-07:00   amazon-ebs.minimal-ol-9-hvm: + pkill --signal HUP sshd
2024-04-09T07:15:36.926-07:00   ==> amazon-ebs.minimal-ol-9-hvm: Provisioning step had errors: Running the cleanup provisioner, if present... 
[…elided…]
2024-04-09T07:55:35.990-07:00   Build 'amazon-ebs.minimal-centos-9stream-hvm' errored after 1 hour 1 minute: Timeout waiting for SSH.

Cc'ing: @lorengordon & @eemperor

lbajolet-hashicorp commented 2 months ago

Hi @ferricoxide,

Looking into this now, from what I can see, if undefined, the kex algorithm list defaults to what the go crypto library exports, so in the current state (testing on Packer main, which relies on golang.org/x/crypto@v0.23.0), the list is the following:

If you're using Packer 1.8.7 (Note: plugin versions may be more relevant to this issue as they're the ones establishing the SSH connection), it's possible that you get another

Judging by this, the second algorithm should work in order to get an algo that is FIPS-compatible, I'm not sure yet why/if the other algos are tried by Packer/Plugins/SDK/Crypto (pick the guilty one), but I'll continue to dig. If sshd is the one closing the connection, is this because the client offered at least one unsupported algorithm? That would seem weird to me, as the whole point of the kex is to determine which algorithm to use, but I'm not very knowledgeable regarding FIPS+ssh.

By any chance, would you be able to provide a template I can play with for testing and troubleshooting this? Something as minimal as possible would be greatly appreciated in order to dig into this. I'll try to build something in the meantime.

Thanks in advance!

ferricoxide commented 2 months ago

By any chance, would you be able to provide a template I can play with for testing and troubleshooting this? Something as minimal as possible would be greatly appreciated in order to dig into this. I'll try to build something in the meantime.

By "template", are you referring to the Amazon Machine Image or the Packer builder? In the former case, any of the AWS CONUS Region AMIs owned by 174003430611 with the name spel-minimal-* can be used (see spel README's Current Published Images section). In the latter case, the Packer content is hosted in under the spel project's spel directory

lbajolet-hashicorp commented 2 months ago

Hi @ferricoxide,

In Packer lingo, "template" are the configuration files, so yeah the spel HCL templates would be it. I'll try to take a look at them, there's a lot of moving parts here so I was hopping for something simple to test with, but this might still do the trick.

I've tested locally on a CentOS 9 Stream qemu VM, and I'm unable to reproduce the problem (yet), it seems that Packer does negotiate an algorithm for kex, and is able to connect in my case.

For reference:

$ PACKER_LOG=1 packer build cent_fips.pkr.hcl
[...]
2024/07/22 09:56:31 packer-plugin-qemu_v1.1.1-dev_x5.0_linux_amd64 plugin: 2024/07/22 09:56:31 client entered key exchange
2024/07/22 09:56:31 packer-plugin-qemu_v1.1.1-dev_x5.0_linux_amd64 plugin: 2024/07/22 09:56:31 received kexInitMsg from server; &ssh.kexInitMsg{Cookie:[16]uint8{0x5c, 0x63, 0x0, 0x27, 0x7f, 0x27, 0xc4, 0xff, 0xa5, 0xf1, 0x2, 0x68, 0xb9, 0x1c, 0x61, 0xf}, KexAlgos:[]string{"ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group-exchange-sha256", "diffie-hellman-group14-sha256", "diffie-hellman-group16-sha512", "diffie-hellman-group18-sha512", "kex-strict-s-v00@openssh.com"}, ServerHostKeyAlgos:[]string{"rsa-sha2-512", "rsa-sha2-256", "ecdsa-sha2-nistp256"}, CiphersClientServer:[]string{"aes256-gcm@openssh.com", "aes256-ctr", "aes128-gcm@openssh.com", "aes128-ctr"}, CiphersServerClient:[]string{"aes256-gcm@openssh.com", "aes256-ctr", "aes128-gcm@openssh.com", "aes128-ctr"}, MACsClientServer:[]string{"hmac-sha2-256-etm@openssh.com", "hmac-sha2-512-etm@openssh.com", "hmac-sha2-256", "hmac-sha2-512"}, MACsServerClient:[]string{"hmac-sha2-256-etm@openssh.com", "hmac-sha2-512-etm@openssh.com", "hmac-sha2-256", "hmac-sha2-512"}, CompressionClientServer:[]string{"none", "zlib@openssh.com"}, CompressionServerClient:[]string{"none", "zlib@openssh.com"}, LanguagesClientServer:[]string{}, LanguagesServerClient:[]string{}, FirstKexFollows:false, Reserved:0x0}
2024/07/22 09:56:31 packer-plugin-qemu_v1.1.1-dev_x5.0_linux_amd64 plugin: 2024/07/22 09:56:31 finding agreed algorithms, client kex: []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group14-sha256", "diffie-hellman-group14-sha1", "ext-info-c", "kex-strict-c-v00@openssh.com"}, server kex: []string{"ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group-exchange-sha256", "diffie-hellman-group14-sha256", "diffie-hellman-group16-sha512", "diffie-hellman-group18-sha512", "kex-strict-s-v00@openssh.com"}
2024/07/22 09:56:31 packer-plugin-qemu_v1.1.1-dev_x5.0_linux_amd64 plugin: 2024/07/22 09:56:31 [DEBUG] handshake complete!
2024/07/22 09:56:31 packer-plugin-qemu_v1.1.1-dev_x5.0_linux_amd64 plugin: 2024/07/22 09:56:31 [DEBUG] Opening new ssh session

Note: the kex logs are some I've manuallty added to a local copy of the crypto lib that I compile the qemu plugin with, nothing that is in the standard logs.

According to the SSH code from the library we use, the first algorithm from the client's list that is supported by the server will be used, so in my case I would think ecdh-sha2-nistp256 is the one that'll be used. Not sure why you are experiencing something different, as this hasn't changed in-between versions IIRC, but maybe on 1.8.7 with a bundled plugin this could work differently.

I'll continue to dig into this, but it's not clear to me still where the problem lies.

lbajolet-hashicorp commented 2 months ago

Update: I tested with the spel template for AWS (in which I removed the ssh_key_exchange_algorithms attribute), no error spotted here as well.

$ SPEL_IDENTIFIER=test SPEL_VERSION=1.0.0 SPEL_BUILDERS=amazon-ebssurrogate.minimal-centos-9stream-hvm make build
[...]
2024/07/22 11:09:16 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:16 [DEBUG] TCP connection to SSH ip/port failed: dial tcp 35.170.201.222:22: connect: connection refused
2024/07/22 11:09:21 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:21 Using host value: ec2-35-170-201-222.compute-1.amazonaws.com
2024/07/22 11:09:21 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:21 [DEBUG] TCP connection to SSH ip/port failed: dial tcp 35.170.201.222:22: connect: connection refused
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 Using host value: ec2-35-170-201-222.compute-1.amazonaws.com
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 [INFO] Attempting SSH connection to ec2-35-170-201-222.compute-1.amazonaws.com:22...
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 [DEBUG] reconnecting to TCP connection for SSH
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 [DEBUG] handshaking with SSH
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 Defaults set; kex algorithms are []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group14-sha256", "diffie-hellman-group14-sha1"} (0xc0002b6600)
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 init kex; cipher list []string{"aes128-gcm@openssh.com", "aes256-gcm@openssh.com", "chacha20-poly1305@openssh.com", "aes128-ctr", "aes192-ctr", "aes256-ctr"}, MAC list []string{"hmac-sha2-256-etm@openssh.com", "hmac-sha2-512-etm@openssh.com", "hmac-sha2-256", "hmac-sha2-512", "hmac-sha1", "hmac-sha1-96"}, kex algo list []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group14-sha256", "diffie-hellman-group14-sha1"}
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 KexAlgos for the kexInitMsg: []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group14-sha256", "diffie-hellman-group14-sha1"}
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 client entered key exchange
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 received kexInitMsg from server; &ssh.kexInitMsg{Cookie:[16]uint8{0x55, 0xee, 0x66, 0xf1, 0xa8, 0x11, 0x82, 0x66, 0xa8, 0xbb, 0x6f, 0x85, 0xf0, 0xa3, 0x4b, 0xd0}, KexAlgos:[]string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group-exchange-sha256", "diffie-hellman-group14-sha256", "diffie-hellman-group16-sha512", "diffie-hellman-group18-sha512", "kex-strict-s-v00@openssh.com"}, ServerHostKeyAlgos:[]string{"rsa-sha2-512", "rsa-sha2-256", "ecdsa-sha2-nistp256", "ssh-ed25519"}, CiphersClientServer:[]string{"aes256-gcm@openssh.com", "chacha20-poly1305@openssh.com", "aes256-ctr", "aes128-gcm@openssh.com", "aes128-ctr"}, CiphersServerClient:[]string{"aes256-gcm@openssh.com", "chacha20-poly1305@openssh.com", "aes256-ctr", "aes128-gcm@openssh.com", "aes128-ctr"}, MACsClientServer:[]string{"hmac-sha2-256-etm@openssh.com", "hmac-sha1-etm@openssh.com", "umac-128-etm@openssh.com", "hmac-sha2-512-etm@openssh.com", "hmac-sha2-256", "hmac-sha1", "umac-128@openssh.com", "hmac-sha2-512"}, MACsServerClient:[]string{"hmac-sha2-256-etm@openssh.com", "hmac-sha1-etm@openssh.com", "umac-128-etm@openssh.com", "hmac-sha2-512-etm@openssh.com", "hmac-sha2-256", "hmac-sha1", "umac-128@openssh.com", "hmac-sha2-512"}, CompressionClientServer:[]string{"none", "zlib@openssh.com"}, CompressionServerClient:[]string{"none", "zlib@openssh.com"}, LanguagesClientServer:[]string{}, LanguagesServerClient:[]string{}, FirstKexFollows:false, Reserved:0x0}
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 finding agreed algorithms, client kex: []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group14-sha256", "diffie-hellman-group14-sha1", "ext-info-c", "kex-strict-c-v00@openssh.com"}, server kex: []string{"curve25519-sha256", "curve25519-sha256@libssh.org", "ecdh-sha2-nistp256", "ecdh-sha2-nistp384", "ecdh-sha2-nistp521", "diffie-hellman-group-exchange-sha256", "diffie-hellman-group14-sha256", "diffie-hellman-group16-sha512", "diffie-hellman-group18-sha512", "kex-strict-s-v00@openssh.com"}
2024/07/22 11:09:26 packer-plugin-amazon_v1.3.3-dev_x5.0_linux_amd64 plugin: 2024/07/22 11:09:26 [DEBUG] handshake complete!

I have tested with the latest AWS plugin, on the latest Packer. I'll see if 1.8.7 has the same behaviour, but if it doesn't, this could maybe be linked to the host environment? I'm testing on an Ubuntu 22.04, without additional restrictions.

lbajolet-hashicorp commented 2 months ago

Other update: tested with Packer 1.8.7, with the AWS plugin v1.3.2 and the one I manually compiled with the extra crypto logs, same behaviour here.

If you have time @ferricoxide, @eemperor or @lorengordon, I'd like to have a chat with one of you to narrow-down the issue, as I'm unable to reproduce the problem, it'll be hard (impossible even) for me to troubleshoot and fix this problem.

ferricoxide commented 2 months ago

Just for clarity: you tested with an image that has FIPS mode enabled? Reason I'd linked to the Current Published Images section document is because the images we produce with Packer _have) FIPS mode enabled, whereas most of the other base images we've tried over the years don't have FIPS enabled (and, only recently started having SELinux enabled "out of the box"). Just want to make sure that the reason you're not able to reproduce the issue isn't because you're using a non-FIPS base image.

lbajolet-hashicorp commented 2 months ago

For the CentOS I have locally yes, I'm using a FIPS-enabled image; here's the template I used for reference:

# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: MPL-2.0

packer {
  required_plugins {
    qemu = {
      version = ">= 1.0.1"
      source  = "github.com/hashicorp/qemu"
    }
  }
}

build {
  sources = ["source.qemu.example"]

  provisioner "shell" {
    inline = ["fips-mode-setup --check 2>&1 | grep enabled"]
  }
}

source "qemu" "example" {
  iso_url = "./CentOS-Stream-9-latest-x86_64-boot.iso"
  iso_checksum = "none"
  headless         = "false"
  memory           = "4096"
  cpu_model = "host"
  boot_steps = [
    ["<up><tab> fips=1 inst.text<enter>", "Setup fips/text mode installation"],
    ["<wait40>", "wait for install prompt"],
    ["2<enter><wait80>", "select text mode install"],
    ["3<enter><wait10>3<enter><wait>1<enter>", "choose network for package installation"],
    ["<wait20>r<enter>", "wait for main menu refresh and mirror detection"],
    ["4<enter>3<enter>c<enter>c<enter><wait2>r<enter>", "select minimal software installation"],
    ["5<enter>c<enter>c<enter><wait>1<enter><wait>c<enter>", "setup standard partitioning scheme, use all disk"],
    ["9<enter>1<enter>2<enter>packer<enter><wait>", "create packer user"],
    ["5<enter>packer<enter><wait3>packer<enter><wait3>yes<enter><wait>", "setup password for packer"],
    ["6<enter>c<enter><wait>", "setup user as admin and leave"],
    ["b<enter>", "start installation"],
    ["<wait400><enter>", "wait until it ends to continue and reboot"]
  ]
  boot_key_interval = "15ms"
  disk_size         = "10G"
  format            = "qcow2"
  boot_wait         = "5s"
  ssh_password      = "packer"
  ssh_username      = "packer"
  ssh_wait_timeout  = "20m"
  vm_name           = "centos9_fips"
  output_directory  = "centos9_fips-out"
}

No problems with SSH in this case, as for the spel templates, I've tested those from the repo directly, running this exact command: SPEL_IDENTIFIER=test SPEL_VERSION=1.0.0 SPEL_BUILDERS=amazon-ebssurrogate.minimal-centos-9stream-hvm make build. Same story, no failures reported with both versions of Packer and the AWS plugin.

lorengordon commented 2 months ago

@lbajolet-hashicorp The spel templates, currently, are set to use a non-FIPS "bootstrap" builder:

https://github.com/plus3it/spel/blob/master/spel/minimal-linux.pkr.hcl#L118

This issue came about because @ferricoxide originally attempted to FIPS-enable those bootstraps, which failed.

To start with a FIPS-enabled image, you could perhaps try instead to pass -var aws_source_ami_filter_centos9stream_hvm=spel-minimal-centos-9stream-hvm-*.x86_64-gp3. But those images have LVM root volumes, which we don't test builds against, so "success" is just "ssh was able to connect" and not necessarily "ran to completion and image was created successfully".

lbajolet-hashicorp commented 2 months ago

Hi @lorengordon,

Thanks for the hint, sorry to say though that even with this change, I am unable to hit a SSH kex issue with the build from spel.

Here's the changes I made for reference:

diff --git a/build/build.sh b/build/build.sh
index 7897655..7f156cb 100644
--- a/build/build.sh
+++ b/build/build.sh
@@ -5,11 +5,12 @@ set -u -o pipefail
 echo "==========STARTING BUILD=========="
 echo "Building packer template, spel/minimal-linux.pkr.hcl"

-packer build \
+PACKER_LOG=1 packer build \
     -only "${SPEL_BUILDERS:?}" \
     -var "spel_identifier=${SPEL_IDENTIFIER:?}" \
     -var "spel_version=${SPEL_VERSION:?}" \
-    spel/minimal-linux.pkr.hcl
+    -var 'aws_source_ami_filter_centos9stream_hvm={"name":"spel-minimal-centos-9stream-hvm-*.x86_64-gp3","owners"=["125523088429","174003430611","216406534498"]}' \
+    spel/minimal-linux.pkr.hcl 2>&1 | tee output.log

 BUILDEXIT=$?

@@ -34,7 +35,7 @@ if [[ -n "${SUCCESS_BUILDS:-}" ]]
 then
     SUCCESS_BUILDERS=$(IFS=, ; echo "${SUCCESS_BUILDS[*]}")
     echo "Successful builds being tested: ${SUCCESS_BUILDERS}"
-    packer build \
+    PACKER_LOG=1 packer build \
         -only "${SUCCESS_BUILDERS//amazon-ebssurrogate./amazon-ebs.}" \
         -var "spel_identifier=${SPEL_IDENTIFIER:?}" \
         -var "spel_version=${SPEL_VERSION:?}" \
diff --git a/spel/minimal-linux.pkr.hcl b/spel/minimal-linux.pkr.hcl
index ebc7545..5e32211 100644
--- a/spel/minimal-linux.pkr.hcl
+++ b/spel/minimal-linux.pkr.hcl
@@ -847,14 +847,6 @@ source "amazon-ebssurrogate" "base" {
   ssh_pty       = true
   ssh_timeout   = "60m"
   ssh_username  = var.spel_ssh_username
-  ssh_key_exchange_algorithms = [
-    "ecdh-sha2-nistp521",
-    "ecdh-sha2-nistp256",
-    "ecdh-sha2-nistp384",
-    "ecdh-sha2-nistp521",
-    "diffie-hellman-group14-sha1",
-    "diffie-hellman-group1-sha1"
-  ]
   subnet_id                             = var.aws_subnet_id
   tags                                  = { Name = "" } # Empty name tag avoids inheriting "Packer Builder"
   temporary_security_group_source_cidrs = var.aws_temporary_security_group_source_cidrs

Invoked with SPEL_IDENTIFIER=test SPEL_VERSION=1.0.0 SPEL_BUILDERS=amazon-ebssurrogate.minimal-centos-9stream-hvm make build, the logs do tell me I'm running a build from AMI ami-0e0a9c0e9cbc491c7, which is one of your images. It does say "NOT HARDENED" in the description, did this one also not have FIPS enabled?

Anything else I can test out so I can reproduce this behaviour?

lorengordon commented 2 months ago

@lbajolet-hashicorp

It does say "NOT HARDENED" in the description, did this one also not have FIPS enabled?

No, our user groups tend to equate "hardened" to "the full DISA STIG has been applied", which these images are just the basis for, but are not intended to do entirely, so we attempt to indicate as such in the description. Perhaps poorly.

Anything else I can test out so I can reproduce this behaviour?

Not that I can think of. You've already gone above and beyond. We'll have to revalidate ourselves and come up with a better reproduction case, if we're still able to reproduce the problem ourselves.

lbajolet-hashicorp commented 2 months ago

Sounds good, thanks for the update! I'll wait for an update on your part regarding this, if the problem's solved itself that'd be great tbh :smile: