hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.11k stars 3.33k forks source link

Packer is not allowing make compiler to finish; Sees exit status 0 and continues #3806

Closed vcardillo closed 7 years ago

vcardillo commented 8 years ago

On the remote host, compilation of a package is happening, via make. At seemingly random spots during the compilation process, Packer will incorrectly see an exit code of 0, and then move on to the next provision. We do not understand why. I can run the script 10 different times, and packer will see an exit code of 0 at ten different places, and move onto the next provision before compilation has properly finished.

The script on which the problem is occurring:

#!/bin/sh

echo "======= Starting Install of Squid!"
set -eu

WORKDIR=/tmp

sudo apt-get -y update
sudo apt-get -y install build-essential libssl-dev
sudo apt-get -y build-dep squid

cd $WORKDIR
wget http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.19.tar.gz
tar xf squid-3.5.19.tar.gz

cd squid-3.5.19
echo "======= Configuring Squid"
./configure --prefix=/usr \
  --localstatedir=/var \
  --libexecdir=/usr/lib/squid3 \
  --datadir=/usr/share/squid3 \
  --sysconfdir=/etc/squid3 \
  --with-default-user=proxy \
  --with-logdir=/var/log/squid3 \
  --with-pidfile=/var/run/squid3.pid \
  --with-openssl

echo "======= Compiling Squid"
make all

echo "======= Installing Squid"
sudo make install

echo "======= Cleaning up"
sudo mkdir -p /var/log/squid3
sudo chown -R proxy:proxy /var/log/squid3
sudo mkdir -p /var/spool/squid3
sudo chown -R proxy:proxy /var/spool/squid3

cd $WORKDIR
rm -rf squid-3.5.19
rm -f squid-3.5.19.tar.gz
echo "======= Done Installing Squid!"

You will see that the "output of build" never gets past "Compiling Squid".

Output of build: https://gist.github.com/vcardillo/18f527dae3452ff94aeb79d460d18f3e#file-terminal-output-piped-via-tee

Output from Packer's debug messages (PACKER_LOG=1): https://gist.github.com/vcardillo/18f527dae3452ff94aeb79d460d18f3e#file-packer-log

vcardillo commented 8 years ago

And look at line 4359-4360 of the Packer log, and you notice this:

2016/08/17 23:33:22 ui:     amazon-ebs: libtool: compile:  g++ -DHAVE_CONFIG_H -I../.. -I../../include -I../../lib -I../../src -I../../include -Wall -Wpointer-arith -Wwrite-strings -Wcomments -Wshadow -Woverloaded-virtual -Werror -pipe -D_REENTRANT -g -O2 -march=native -std=c++11 -MT SourceIp.lo -MD -MP -MF .deps/SourceIp.Tpo -c SourceIp.cc  -fPIC -DPIC -o .libs/SourceIp.o
2016/08/17 23:34:12 packer: 2016/08/17 23:34:12 remote command exited with '0': chmod +x /tmp/script_3146.sh; PACKER_BUILD_NAME='amazon-ebs' PACKER_BUILDER_TYPE='amazon-ebs' /tmp/script_3146.sh

Specifically, this: 2016/08/17 23:34:12 packer: 2016/08/17 23:34:12 remote command exited with '0'

Packer thinks that compilation has finished--but it hasn't.

cbednarski commented 8 years ago

Thanks for the report. I think I have seen a few other similar (possibly duplicate) reports but there is a concise repo here so I will try to work with this. I suspect there is a networking problem or concurrency error here that causes packer to stop waiting for the command to complete.

cbednarski commented 8 years ago

Can you also include the packer template, or at least the provisioner section for this so I can see how you're invoking the script?

vcardillo commented 8 years ago

Hi @cbednarski,

Thanks for your response. I've added a few things for you:

build.sh, that invokes everything: https://gist.github.com/vcardillo/18f527dae3452ff94aeb79d460d18f3e#file-build-sh

template.json: https://gist.github.com/vcardillo/18f527dae3452ff94aeb79d460d18f3e#file-template-json

And I also added install_squid.sh from my above opening comment, just so everything is in one place: https://gist.github.com/vcardillo/18f527dae3452ff94aeb79d460d18f3e#file-install_squid-sh

I think that's pretty much everything to exactly re-create what I'm seeing. If you'd like the exact AMI I'm compiling on:

Canonical Ubuntu AMI: ami-8e0b9499

Let me know how else I can help!

vcardillo commented 8 years ago

And also as you can see, the problem persisted even in debug mode, and even with -parallel=false

mwhooker commented 7 years ago

Unfortunately I don't have enough information to be able to reproduce this. Need a json with all the required files that I can run which reproduces.

I think this is probably related to SSH being disconnected. We made a bunch of fixes to that logic after this was opened, so I'd be curious to see if this is still an issue.

It might also be helpful to add "expect_disconnect": false to your shell provisioners which you don't expect to result in a disconnect.

Please open a new ticket with the required information if you're still seeing this.

adamstegman commented 7 years ago

Upgrading to 0.12.2 and adding "expect_disconnect": false clarified this for me by failing more obviously; now investigating this as a network issue in our office that is triggered by something packer is doing.