Closed babelop closed 7 years ago
@bparees The new issue as you requested. Build logs available in description.
@binarybabel can you check if https://github.com/openshift/source-to-image/pull/343 fixes this issue?
@rhcarvalho @mfojtik can you guys re-evaluate this and close if you think it's fixed/non-reproducible?
closing based on age, lack of info, and that we've reworked this code numerous times since this issue was reported.
@bparees I'm running into a variation of this, so requesting to re-open.
test
(this will take a long time for the first run while it builds the image)There is a pause of two minutes between incremental builds, caused by docker.holdHijackedConnection
hanging in d.redirectResponseToOutputStream
until RunContainer
times out with case <-time.After(DefaultDockerTimeout)
.
I have traced this to stdCopy.StdCopy
, where out.Write(buf[StdWriterPrefixLen : frameSize+StdWriterPrefixLen])
hangs for two minutes, then exits with the closed pipe error. The logrus output does not seem to go anywhere, I had to change it to Println
to see the error.
I'm running this on Debian Stretch.
(One of the last tests hangs indefinitely, but that's a different issue, possibly on my side.)
PS Shouldn't there be some kind of warning logged if RunContainer
exits with a time out?
@accursoft hi, what version of s2i are you running?
$ s2i version
s2i v1.1.2-38-gddb10f1
I'm trying to reproduce with:
$ s2i version
s2i v1.1.3
$ git clone https://github.com/accursoft/Haskell-Cloud.git
$ cd Haskell-Cloud
$ ./test
Sending build context to Docker daemon 13.82 kB
Step 1 : FROM accursoft/micro-debian
Trying to pull repository docker.io/accursoft/micro-debian ...
...
Looking like it's gonna take forever, but still running ghc
:smile:
fyi i just did a new release last night. which i guess @rhcarvalho already picked up :)
@bparees yes I did, I'm testing with the latest release.
@accursoft does this resemble what you're observing?
...
Step 16 : COPY s2i /opt/s2i/
---> a38370cab215
Removing intermediate container e5a94d27cc8b
Successfully built a38370cab215
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ghc-temp
---> a38370cab215
Step 2 : MAINTAINER Gideon Sireling <code@accursoft.com>
---> Running in 00103e06c964
---> 3975c9b373fd
Removing intermediate container 00103e06c964
Step 3 : LABEL io.openshift.s2i.scripts-url "image://opt/s2i" io.k8s.display-name "Haskell" io.k8s.description "GHC and cabal-install" io.openshift.tags "haskell,ghc,network,ghc-8.0.1,cabal-install-1.24.0.0,network-2.6.3.1,builder" io.openshift.expose-services "8080:http"
---> Running in 3ad2c0804727
---> fe41e4ce0010
Removing intermediate container 3ad2c0804727
Successfully built fe41e4ce0010
Untagged: ghc-temp:latest
* Testing create app ...
Running S2I version "v1.1.3"
Preparing to build test
Copying sources from "/vagrant/Haskell-Cloud/server" to "/tmp/s2i528190197/upload/src"
Image "test:latest" not available locally, pulling ...
Clean build will be performed
Running "assemble" in "test"
Downloading the latest package list from hackage.haskell.org
3 CPUs available for parallel builds
Resolving dependencies...
Configuring server-0.0...
Building server-0.0...
Installed server-0.0
* Testing enable flags ...
Running S2I version "v1.1.3"
Preparing to build test
Copying sources from "/vagrant/Haskell-Cloud/server" to "/tmp/s2i915776168/upload/src"
Existing image for tag test detected for incremental build
Saving build artifacts from image test to path /tmp/s2i915776168/upload/artifacts
Running "assemble" in "test"
3 CPUs available for parallel builds
marker: cabal_flags
b
Resolving dependencies...
Configuring server-0.0...
Building server-0.0...
Installed server-0.0
Removing previously-tagged image sha256:9662fc589e27ec609225596df921b9e46a0591b9dceeec6d62f3346067b2bced
curl: (56) Recv failure: Connection reset by peer
* enable flags FAILED
Running with bash -x
I see the curl call is not part of the build:
+ docker run --name test -d -p 8080:8080 -u 1001 test
+ sleep 0.01
+ curl -sS localhost:8080
+ grep -q Greetings
curl: (56) Recv failure: Connection reset by peer
+ echo ' * pre_build hook FAILED'
* pre_build hook FAILED
+ exit 1
So that looks like an error in the image, @accursoft can you confirm / provide another way to reproduce the s2i build
problem? Thanks!
Increasing the value of sleep
may help on slow systems. Either way, the problem should be manifest in a 2 minute pause between the first two tests, so you could just if false
out all the others and skip to the clean up at the end.
test
sets --loglevel=1
; if you change this to 2
you should see the pause immediately after unpacking the tar stream from save-artifacts
.
My docker is Docker version 1.12.2, build bb80604
from apt.dockerproject.org
.
@accursoft would you mind making the proposed edits in your code and sending it over? It's a holiday tomorrow, so I may only look again into this next week. Thanks.
@accursoft I was able to reproduce what you've described, no need to send over code changes.
Changes I've made:
diff --git a/test b/test
index 55fa2c2..af3c2f7 100755
--- a/test
+++ b/test
@@ -1,6 +1,6 @@
#!/bin/bash -eu
-s2i="s2i build --rm --incremental --loglevel=1 . accursoft/ghc-network test"
+s2i="s2i build --rm --incremental --loglevel=2 . accursoft/ghc-network test"
function check {
echo " * Testing $1 ..."
Here's what I see:
$ bash -x test
[...]
+ s2i build --rm --incremental --loglevel=2 . accursoft/ghc-network test
Running S2I version "v1.1.3"
Builder Name: Haskell
Builder Image: accursoft/ghc-network
Source: .
Output Image Tag: test
Environment:
Labels:
Incremental Build: enabled
Incremental Image Pull User:
Remove Old Build: enabled
Builder Pull Policy: if-not-present
Previous Image Pull Policy: if-not-present
Quiet: disabled
Layered Build: disabled
Docker Endpoint: unix:///var/run/docker.sock
Image sha256:fe41e4ce0010a5c5cc3bef2cd5b1fd6587c4b9a17b03d8b6d83ba21f93cbfcf1 contains io.openshift.s2i.scripts-url set to "image://opt/s2i"
Preparing to build test
Copying sources from "/vagrant/Haskell-Cloud/server" to "/tmp/s2i209367094/upload/src"
Existing image for tag test detected for incremental build
Performing source build from file:///vagrant/Haskell-Cloud/server
Saving build artifacts from image test to path /tmp/s2i209367094/upload/artifacts
Image sha256:d800c645e21292816ab8ca6b5b6222d11a76e3952f28edfde14b90584faf6ead contains io.openshift.s2i.scripts-url set to "image://opt/s2i"
Base directory for S2I scripts is 'opt/s2i'. Untarring destination is '/tmp'.
Creating container with options {Name:"s2i_test_3d9ae978" Config:&{Hostname: Domainname: User:build AttachStdin:false AttachStdout:true AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[] Cmd:[opt/s2i/save-artifacts] Healthcheck:<nil> ArgsEscaped:false Image:test:latest Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:&{Binds:[] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:false PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CpusetCpus: CpusetMems: Devices:[] DiskQuota:0 KernelMemory:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:0 Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[]}} ...
Attaching to container "a06e3a5e13a4aa6bf5a91e4088bf4d0944abffc4c0aee06eca849f26b3375323" ...
Starting container "a06e3a5e13a4aa6bf5a91e4088bf4d0944abffc4c0aee06eca849f26b3375323" ...
Done extracting tar stream
<<< long pause >>>
Running "assemble" in "test"
Using image name accursoft/ghc-network
starting the source uploading ...
Image sha256:fe41e4ce0010a5c5cc3bef2cd5b1fd6587c4b9a17b03d8b6d83ba21f93cbfcf1 contains io.openshift.s2i.scripts-url set to "image://opt/s2i"
Base directory for S2I scripts is 'opt/s2i'. Untarring destination is '/tmp'.
[...]
The pause by itself doesn't necessarily mean there's something wrong, might just be the log messages fooling us. But yes, looks suspicious and deserves investigation.
Here's a goroutine dump taken during the pause for (my) reference later:
And here is the output with --loglevel=7
just in case:
And to increase the confidence that there is a bug:
$ git grep -n '2 \* time.Minute'
pkg/docker/docker.go:68: DefaultDockerTimeout = 2 * time.Minute
I'll be happy to fix this issue :smile:
The code that causes the 2 min pause on the happy path is here: pkg/docker/docker.go#L1068-L1073
I could not write any small patch that would not introduce other problems :(
Trying to write an integration test that reproduces the problem, so that we get not just a fix, but also a regression test.
@rhcarvalho please see my earlier comment - the root cause lies much deeper.
Any progress with this?
@accursoft I haven't had time I wished to devote to this, and had hopes #625 (though in spirit unrelated) would improve the situation (for the are some good commits cleaning up synchronization issues). I re-run your code against that PR's branch and realized that the builds would actually hang forever, not the 2min we see now.
Would be really helpful to have a smaller/quicker example to reproduce the issue, ideally in the form of an integration test (something I started but then got pulled into other work).
@bparees if there is anybody with more bandwidth to work on this, feel free to reassign.
@accursoft taking a look at this - when I run ./test it eventually bugs out with:
HC [stage 1] libraries/base/dist-install/build/GHC/Unicode.o
HC [stage 1] libraries/base/dist-install/build/Text/ParserCombinators/ReadP.o
HC [stage 1] libraries/base/dist-install/build/Text/ParserCombinators/ReadPrec.o
HC [stage 1] libraries/base/dist-install/build/Text/Read/Lex.o
/usr/bin/ld: fatal error: -pie and -r are incompatible
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
make[1]: *** [libraries/base/dist-install/build/GHC/Float.o] Error 1
libraries/base/ghc.mk:4: recipe for target 'libraries/base/dist-install/build/GHC/Float.o' failed
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
The command '/bin/sh -c /tmp/ghc.sh' returned a non-zero code: 2
Any thoughts?
@accursoft if you've got working images sitting somewhere public I can pull them if that's easiest?
https://www.mail-archive.com/debian-haskell@lists.debian.org/msg07168.html
I'll have a look at it might just need to update some dependencies.
@accursoft https://ghc.haskell.org/trac/ghc/ticket/12759 looks possible?
@jim-minter Yes, that's the culprit. I'm checking if this has been fixed in Stretch.
Am trying patching image/ghc.sh according to https://ghc.haskell.org/trac/ghc/ticket/12759#comment:5
@jim-minter @accursoft my local checkout is pointing at https://github.com/accursoft/Haskell-Cloud/commit/47b1dcc56f8b30effa9c410cd65860f4b04abdde, I did not get that error. Maybe it was introduced in one of the more recent commits.
Building the images does take a lot of time though.
The build error is fixed in ghc 8.0.2, currently in RC. I'll update the repo when it's released. You need to change 2 lines in image/ghc.sh to use the RC:
--- a/image/ghc.sh Mon Oct 31 19:24:02 2016 +0000
+++ b/image/ghc.sh Tue Nov 29 14:24:42 2016 +0000
@@ -34,7 +34,7 @@
echo "silent
show-error" >>~/.curlrc
echo "Downloading GHC ..."
-curl https://downloads.haskell.org/~ghc/8.0.1/ghc-8.0.1-src.tar.xz | tar xJ
+curl https://downloads.haskell.org/~ghc/8.0.2-rc1/ghc-8.0.1.20161117-src.tar.xz | tar xJ
cd ghc-*
#hpc, hp2ps, runghc and iserv not needed
@@ -66,4 +66,4 @@
/usr/lib/gcc/x86_64-linux-gnu/6/*.a \
/usr/lib/gcc/x86_64-linux-gnu/6/cc1 \
/usr/lib/gcc/x86_64-linux-gnu/6/lto1 \
- /usr/local/lib/ghc-8.0.1/rts/*
\ No newline at end of file
+ /usr/local/lib/ghc-*/rts/*
\ No newline at end of file
@accursoft thanks for the bug report. I now understand the issue - Debian's tar is appending 6144 '\0' bytes to the end of its tar stream, which the Go tar reader isn't consuming. Because these data are stuck in flight, the container doesn't end up closing. The fix will be for s2i to consume any remaining data in the pipe to /dev/null. It's a straightforward fix which is best done as part of or on top of #625 since that reworks this area anyway.
Update - the patch that should fix this is in PR #625.
Saving incremental artifacts seems to fail randomly on (perhaps just large) projects. Could be a regression of #171, although ulimit and filesystem hard limits doesn't seem to affect it.
I've created a deployable project to replicate the issue, available at:
https://github.com/binarybabel/sti-rails-incremental-demo Build config and logs are available in the
extra
directory.Level 5 build log illustrating the problem:
https://raw.githubusercontent.com/binarybabel/sti-rails-incremental-demo/master/extra/incremental-failure.log