Open robotrapta opened 2 years ago
Hello @robotrapta ! Thanks for opening, yeah, I agree with you here about the fact that this feel like a hack. The thing is that Packer sorts of expects to log in to an instance once — in the beginning — and there is no internal/native way to 'reconnect' that feel great. So we would like to introduce a new feature soon to be able to 'connect' in the middle of a build. This would allow changing SSH settings or reboot after an installation. But that one will not come straight away, as we have quite a large and growing to-do list. Making a docs page about that would be a good idea, I think. I'll bring that one up to the team.
One thing that comes to mind here is that you could install the drivers at the end of your provisioning steps, and just shutdown/save the machine. Upon next boot, things should be configured. If you have more things to install/configure, then you could for example start another build ?
With that said, and if that does not work out, do you mind sharing your build file ? And your logs ? Maybe we can help you better/differently from there.
Hello ! I'm trying to achieve something similar:
{
"provisioners": [
{
"type": "ansible",
"playbook_file": "playbook.yml"
},
{
"type": "shell",
"inline": [ "reboot now" ],
"expect_disconnect": true
},
{
"type": "file",
"source": "serverspec/",
"destination": "/tmp",
"pause_before": "30s"
},
{
"type": "shell",
"script": "serverspec.sh"
}
]
}
I'm using pause_before
in the next provisioner instead of the pause_after
, but don't know which one is better.
Hi @azr thanks for the suggestion of installing the drivers at the end - unfortunately that doesn't work for me. There's a bunch of software I need to install which depends on having CUDA installed, and some of those installations will fail if it can't confirm nvidia hardware/drivers present.
Hi, similar use case here, and found the results to be sort of inconsistent. My current template is something like:
provisioner "shell" {
pause_before = "10s"
script = "./scripts/updates.sh"
}
provisioner "ansible-local" {
role_paths = ["./roles"]
playbook_file = "./roles/ubuntu/ubuntu.yml"
command = "sudo ansible-playbook -i localhost -e 'ansible_python_interpreter=/usr/bin/python3'"
}
provisioner "shell" {
script = "./scripts/reboot.sh"
expect_disconnect = true
}
provisioner "shell" {
pause_before = "120s"
script = "./scripts/finish.sh"
max_retries = 3
}
What I've found is that I always can see the message that the reboot.sh
prints (Rebooting to apply updates
), and sometimes I see the next message (Pausing for 2 minutes before next text
or something similar, can't remeber). But sometimes I the build just fails after the reboot message. This seems strange, since I've been following the VMs and reboot usually take around 30-60 seconds, and I do have the retry mechanism for the finish provisioner. Couldn't quite find a reliable way to do that. I ran about 10-20 builds today and is completely hit or miss.
I understand the solution proposed by @azr but just as @robotrapta I also wanted to perform actions after the reboot. There may be a work around, sure, but the reboot is just the natural way to go for us.
The impression I get is that Packer "crashes" (probably not the right wording, pardon me) after the reboot, even with expect_disconnect
and doesn't understand the next task to perform (wait to reconnect).
I could try a few more tests with debug logs on maybe.
This issue has been synced to JIRA for planning.
JIRA ID: HPR-770
A request on how to reboot came up in the discuss group https://discuss.hashicorp.com/t/how-to-reboot-vm-with-packer/46083/2
You are already using pause_before
and pause_after
. Try also adding ssh_read_write_timeout = 5m
.
e.g.
source "amazon-ebs" "ubuntu-bionic" {
ami_name = "ubuntu-bionic-18.04-hvm-ebs-{{timestamp}}"
instance_type = "t2.micro"
region = "us-west-2"
source_ami_filter {
filters = {
name = "ubuntu/images/*ubuntu-bionic-18.04-amd64-server-*"
root-device-type = "ebs"
virtualization-type = "hvm"
}
most_recent = true
owners = ["099720109477"]
}
ssh_username = "ubuntu"
ssh_read_write_timeout = "5m" # Allow reboots
}
Or simply because you need to reboot to a newer kernel, installed during the provisioning in order to be able to remove the one that is currently booted.
any recommended way to reboot during provisioning?
The documentation actually makes mention of it.
Firstly with expect_disconnect meant to be used in your provisioner rebooting your machine, and secondly with start_retry_timeout to be used in your subsequent shell provisioner.
Agree that this is already available with use of pause_before
, expect_disconnect
and start_retry_timeout
in the shell
provisioner.
pause_before (duration) - Sleep for duration before execution.
expect_disconnect (boolean) - Defaults to false. When true, allow the server to disconnect from Packer without throwing an error. A disconnect might happen if you restart the SSH server or reboot the host.
start_retry_timeout (string) - The amount of time to attempt to start the remote process. By default this is 5m or 5 minutes. This setting exists in order to deal with times when SSH may restart, such as a system reboot. Set this to a higher value if reboots take a longer amount of time.
Additionally, this is available for Windows with the windows-restart
provisoner.
provisioner "windows-restart" {
pause_before = "30s"
restart_check_command = "powershell -command \"& {Write-Output 'restarted.'}\""
restart_timeout = "10m"
}
cc @nywilken @lbajolet-hashicorp
Thanks for the update here @tenthirtyam,
This is an old issue, we do have documentation on those options, but maybe the workflow isn't intuitive, or the documentation is lacking.
That said, since this hasn't been updated for a while, and most of the updates seem to point to community resources or sharing examples of how the problem was fixed/circumvented.
I'm tempted to close this issue now, but I'd like to hear from others that commented on this issue before: are you still experiencing the problem? Do you have suggestions on how we can improve Packer or the docs that would've helped you solve that issue?
None of the suggestions work reliably, at least, not in combination with the AWS session-manager-plugin. I've tried adding pause_after
to the step that reboots, pause_before
to the step following the reboot. I've tried adding an interim shell-local step. I've tried adding retries. I've tried setting ssh_read_write_timeout
to something low like 1m
, but that timeout doesn't seem to apply until after the connection has started.
What seems to be happening is that the session-manager-plugin itself does not reliably notice the remote end has gone away and we cannot adjust its timeout which is apparently 1 hour. Once it's finally timed out it's too late.
The important line to notice is the aws session-manager-plugin disconnecting on line 17.
I was using the following settings both times:
ssh_read_write_timeout = "3m"
provisioner "shell" {
script = "./stage2.setup_ami.sh"
execute_command = "sudo {{ .Path }} reboot"
expect_disconnect = true
skip_clean = true
pause_after = "1m"
}
provisioner "shell" {
script = "./stage2.setup_ami.sh"
execute_command = "sudo {{ .Path }} setup2"
max_retries = 5
}
I think what we need is perhaps a new option force_disconnect
instead of expect_disconnect
. At least in the case of using the aws session-manager-plugin.
Edit to add: This works reliably for me with the aws session-manager-plugin:
provisioner "shell" {
script = "./stage2.setup_ami.sh"
execute_command = "sudo {{ .Path }} reboot"
expect_disconnect = true
skip_clean = true
}
# Force kill the session-manager-plugin since it doesn't always notice the
# remote end going away. Packer will restart it. This seems to be the only
# reliable way to handle reboots.
provisioner "shell-local" {
inline = ["pkill -g 0 session-manager-plugin"]
}
provisioner "shell" {
pause_before = "10s"
inline = ["uptime"]
max_retries = 10
}
That's obviously a hack. The amazon plugin specifically needs better ssm handling, but generally, what I think is needed is a way to tell packer that the remote machine is positively going away between steps and it should do whatever it has to to drop and re-establish the connection.
Description
Howdy y'all! I need to restart my machine during provisioning. I'm new to packer coming from systems like ansible and chef. I've read a bunch of docs on this, and am still confused. So I think this is at least a doc bug, and perhaps a full feature request.
I know this issue has been discussed in https://github.com/hashicorp/packer/issues/1983 and that there was a proposal to build a native provisioner in https://github.com/hashicorp/packer/pull/4555 - a native feature makes a ton of sense to me.
I know the recommended way to do this is with a shell provisioner. This seems to rely on the retry mechanism, treating the reboot as a kinda-expected error and then getting the system to recover from it appropriately. Maybe this is useful for other reasons, but it feels like an ugly hack. A hack would be okay if it was clearly documented and worked reliably. But it's not clearly documented - there is no example that I can find showing how to do this. My first few attempts to get it to work after reading the docs were unreliable - sometimes it failed, and sometimes it re-ran things unnecessarily. So it would be great to just tell people the standard way to do this if there isn't a built-in way to do it.
I think a good way to do it is this:
The
pause_after
I believe is important to minimize the risk of an expected race condition in issuing the next provisioning command? Which seems to me like a pretty strong argument for making this a native feature.If that's in fact correct, putting that example code in the docs would be awesome. Thanks!
Use Case(s)
I'm trying to install nvidia drivers to use CUDA, which generally requires a reboot.