Open GoogleCodeExporter opened 8 years ago
Thanks for reporting this. We'll investigate.
Original comment by z...@google.com
on 4 Jan 2016 at 1:54
Let me know if you'd like the full log files, don't think they contain much
more than what I've already posted.
Original comment by michael....@gmail.com
on 4 Jan 2016 at 4:02
Michael, are you the same person that reported the issue on Stack Overflow:
http://stackoverflow.com/questions/34423857/gae-managed-vm-deploy-gets-stuck-on-
docker-build ?
Either way, are you able to reproduce this consistently? If so, could you share
your Dockerfile? Feel free to email it to me directly at dlorenc@google.com
If you're able to SSH into the VM, the output of "cat /var/log/docker.log"
might be helpful.
Original comment by dlor...@google.com
on 5 Jan 2016 at 10:09
I was the one who commented on the post, but I'm not the original poster.
I'll look into pulling those files.
Michael Sander
michael.sander@gmail.com
607-227-9859
Original comment by michael....@gmail.com
on 6 Jan 2016 at 2:33
I've been having this issue too with my node app.
In my office, some people can deploy the app just fine and others are hitting
this bug, despite us all using the same version of gcloud SDK (91.0.1), the
same OS (OS X 10.11), same version of python (2.7.10 apple-compatible), etc.
I can provide some info that you asked for too, see below.
Dockerfile:
FROM node:4.2.3
RUN apt-get update -y && apt-get install -y -q --fix-missing \
graphicsmagick \
imagemagick
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
RUN npm --unsafe-perm install --production
COPY . /usr/src/app
CMD [ "npm", "start" ]
I SSHed into the VM and the output of /var/log/docker.log just repeats these
view messages (the first one a lot more frequently than the others):
time="2016-01-06T14:53:01.689552671Z" level=info msg="GET /version"
time="2016-01-06T14:53:04.881663725Z" level=info msg="GET /images/json"
time="2016-01-06T14:53:04.928656071Z" level=info msg="GET
/containers/json?all=1"
Original comment by rich...@10kcoffees.com
on 6 Jan 2016 at 4:18
Hmm, one idea, Docker stores credentials in a file called ".dockercfg" if you
run "cat ~/.dockercfg", do you see any entries for "index.docker.io"?
Don't paste the contents of .dockercfg here, for registries other than GCR it
can contain private credentials.
Original comment by dlor...@google.com
on 6 Jan 2016 at 5:17
Original comment by z...@google.com
on 6 Jan 2016 at 5:30
I don't have an entry there for "index.docker.io", though all the other entries
have the email set to "not@val.id" (I'm guessing they're examples).
I'm not sure if this is the cause though, as I can deploy other apps without
hitting this issue.
Original comment by rich...@10kcoffees.com
on 6 Jan 2016 at 5:46
Same here. Below is the file with the auth sections redacted. I'm running on
windows btw, so I don't have a ~/.dockercfg. I snooped around and found it in
C:\home\Michael\.dockercfg, which is a bit of a weird place to put it.
{
"https://bucket.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 1>"
},
"https://appengine.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 1>"
},
"https://eu.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 1>"
},
"https://gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 2>"
},
"https://us.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 2>"
},
"https://asia.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 1>"
},
"https://b.gcr.io": {
"email": "not@val.id",
"auth": "<REDACTED 1>"
}
}
Original comment by michael....@gmail.com
on 6 Jan 2016 at 6:40
I am facing a similar issue as well.
preview app deploy gets stuck in "Docker Build Ouput.." and then exits after
about 10 mins. The VM is created but docker ps returns empty on the VM. Output
of gcloud preview app..and /var/log/docker.log is pasted below.
/var/log/docker.log
time="2016-01-25T17:35:26Z" level=info msg="+job
serveapi(unix:///var/run/docker.sock, tcp://0.0.0.0:2376)"
time="2016-01-25T17:35:26Z" level=info msg="Listening for HTTP on unix
(/var/run/docker.sock)"
time="2016-01-25T17:35:26Z" level=info msg="Listening for HTTP on tcp
(0.0.0.0:2376)"
time="2016-01-25T17:35:26Z" level=info msg="+job init_networkdriver()"
time="2016-01-25T17:35:27Z" level=info msg="-job init_networkdriver() = OK (0)"
time="2016-01-25T17:35:27Z" level=warning msg="Your kernel does not support
cgroup swap limit."
time="2016-01-25T17:35:27Z" level=info msg="Loading containers: start."
time="2016-01-25T17:35:27Z" level=info msg="Loading containers: done."
time="2016-01-25T17:35:27Z" level=info msg="docker daemon: 1.6.2 7c8fca2;
execdriver: native-0.2; graphdriver: aufs
"
time="2016-01-25T17:35:27Z" level=info msg="+job acceptconnections()"
time="2016-01-25T17:35:27Z" level=info msg="-job acceptconnections() = OK (0)"
time="2016-01-25T17:35:27Z" level=info msg="Daemon has completed
initialization"
time="2016-01-25T17:35:47Z" level=info msg="GET /version"
time="2016-01-25T17:35:47Z" level=info msg="+job version()"
time="2016-01-25T17:35:47Z" level=info msg="-job version() = OK (0)"
time="2016-01-25T17:35:48Z" level=info msg="GET /v1.18/_ping"
time="2016-01-25T17:35:48Z" level=info msg="POST
/v1.18/build?pull=0&nocache=False&q=False&t=bigtable-1152.default.
20160125t230437&forcerm=False&rm=True"
time="2016-01-25T17:35:48Z" level=info msg="+job build()"
Output of gcloud preview app deploy --verbosity=debug
DEBUG: Couldn't find 'auths' section
DEBUG: Found entry (registry=u'https://bucket.gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://appengine.gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://eu.gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://us.gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://asia.gcr.io', username=u'_token')
DEBUG: Found entry (registry=u'https://b.gcr.io', username=u'_token')
INFO: Starting new HTTPS connection (1): 104.197.210.22
DEBUG: "GET /version HTTP/1.1" 200 151
DEBUG: "GET /v1.18/_ping HTTP/1.1" 200 2
Building and pushing image for module [default]
INFO: Generating Dockerfile.
INFO: Saving [Dockerfile] to
[/Users/vinay/src/yawasa-server/appengine-backend/build/staged-app].
INFO: Building docker image bigtable-1152.default.20160125t231615 from
/Users/vinay/src/yawasa-server/appengine-backend/build/staged-app/Dockerfile:
--------------------------------------------- DOCKER BUILD OUTPUT
---------------------------------------------
DEBUG: Looking for auth config
DEBUG: Sending auth config (u'https://bucket.gcr.io',
u'https://appengine.gcr.io', u'https://eu.gcr.io', u'https://gcr.io',
u'https://us.gcr.io', u'https://asia.gcr.io', u'https://b.gcr.io')
$
Original comment by chitlan...@gmail.com
on 25 Jan 2016 at 5:51
One update. When I reduce the number of jars in the WEB-INF/lib directory, the
docker build succeeds. Perhaps there is a timeout hitting somewhere, which does
not write any logs..so there is nothing in the logs to help.
Original comment by chitlan...@gmail.com
on 26 Jan 2016 at 6:31
Hello I'm not quite sure if this will help in your case but I had the same
issue with deployment handing in the "Sending auth config" line and the problem
was that we had a previous docker configuration files inside the repo which
included .dockerignore file that had Dockerfile added to it. Once I removed the
Dockerfile from .dockerignore the deployment ran through successfully.
Original comment by ja...@prioridata.com
on 26 Jan 2016 at 9:42
Hey All,
I think I figured this out (at least on my system).
TLDR: delete large files in your project directory or add them to .dockerignore
(even if they're under .gitignore or skipfiles).
On my system, it isn't actually hanging when it says "sending auth config", but
rather, it's sending all of the files to the docker build process. If you take
a look at your network connection, you'll notice that the process is using the
network pretty heavily. The line that it spends all of its time on is the
self._post call in the file:
google-cloud-sdk\lib\third_party\docker\docker\api\build.py
To debug this issue, I instrumented the call to self._post so it would print
out its sending progress. It's slow. It takes about 2 seconds to transfer one
megabyte. Even though my application code is only about 150mb, I had a bunch of
data files (~1GB) and some test files (~500MB) lying around. These files were
in my .gitignore and are listed in the skip_files section of app.yaml. However,
when you create a managed vm instance, those files are ignored. Docker sends
everything. At that upload rate, it would take well over an hour to deploy. At
some point well before that (around 12 mins in), the connection gets dropped.
The work-around solution is to edit your .dockerignore file to include all of
the files you do not want uploaded. The better solution would be for gcloud to
read the skip_files configuration in your app.yaml file, and automatically
create a .dockerignore that adheres to what is listed in skip_files.
With all that said, I do think there are some bugs here:
1 - The TAR upload is excruciatingly slow, well less than 1.5mbps. There's
something wrong here, my network is faster than that.
2 - The TAR files should be gzipped before being sent. It would save
significantly on time.
3 - There seems to be a timeout on the GCE side that shuts down the connection
early.
Below is a the output of my instrumentation code (abridged), just so you can
see the approximate speed of the upload.
---------- DOCKER BUILD OUTPUT --------------------------------------------
DEBUG: Looking for auth config
DEBUG: Sending auth config ('https://bucket.gcr.io', 'https://appengine.gcr.io', 'https://eu.gcr.io', 'https://gcr.io', 'https://us.gcr.io', 'https://asia.gcr.io', 'https://b.gcr.io')
DEBUG: Sent 1 megabytes. Time elapsed: 0:00:03.119000
DEBUG: Sent 2 megabytes. Time elapsed: 0:00:05.110000
DEBUG: Sent 3 megabytes. Time elapsed: 0:00:07.041000
DEBUG: Sent 4 megabytes. Time elapsed: 0:00:11.313000
DEBUG: Sent 5 megabytes. Time elapsed: 0:00:16.782000
DEBUG: Sent 6 megabytes. Time elapsed: 0:00:23.067000
DEBUG: Sent 7 megabytes. Time elapsed: 0:00:30.161000
DEBUG: Sent 8 megabytes. Time elapsed: 0:00:36.966000
DEBUG: Sent 9 megabytes. Time elapsed: 0:00:43.900000
DEBUG: Sent 10 megabytes. Time elapsed: 0:00:49.451000
DEBUG: Sent 11 megabytes. Time elapsed: 0:00:55.113000
DEBUG: Sent 12 megabytes. Time elapsed: 0:01:00.673000
DEBUG: Sent 13 megabytes. Time elapsed: 0:01:06.226000
DEBUG: Sent 14 megabytes. Time elapsed: 0:01:12.447000
DEBUG: Sent 15 megabytes. Time elapsed: 0:01:17.701000
DEBUG: Sent 16 megabytes. Time elapsed: 0:01:23.270000
DEBUG: Sent 17 megabytes. Time elapsed: 0:01:29.067000
DEBUG: Sent 18 megabytes. Time elapsed: 0:01:34.120000
DEBUG: Sent 19 megabytes. Time elapsed: 0:01:39.493000
DEBUG: Sent 20 megabytes. Time elapsed: 0:01:43.724000
DEBUG: Sent 21 megabytes. Time elapsed: 0:01:48.390000
DEBUG: Sent 22 megabytes. Time elapsed: 0:01:52.796000
DEBUG: Sent 23 megabytes. Time elapsed: 0:01:57.806000
DEBUG: Sent 24 megabytes. Time elapsed: 0:02:02.722000
DEBUG: Sent 25 megabytes. Time elapsed: 0:02:06.590000
DEBUG: Sent 26 megabytes. Time elapsed: 0:02:10.925000
DEBUG: Sent 27 megabytes. Time elapsed: 0:02:15.632000
DEBUG: Sent 28 megabytes. Time elapsed: 0:02:20.260000
DEBUG: Sent 29 megabytes. Time elapsed: 0:02:25.226000
DEBUG: Sent 30 megabytes. Time elapsed: 0:02:30.349000
DEBUG: Sent 31 megabytes. Time elapsed: 0:02:34.791000
DEBUG: Sent 32 megabytes. Time elapsed: 0:02:53.237000
DEBUG: Sent 33 megabytes. Time elapsed: 0:03:17.726000
DEBUG: Sent 34 megabytes. Time elapsed: 0:03:42.369000
DEBUG: Sent 35 megabytes. Time elapsed: 0:04:06.470000
DEBUG: Sent 36 megabytes. Time elapsed: 0:04:30.843000
DEBUG: Sent 37 megabytes. Time elapsed: 0:04:55.262000
DEBUG: Sent 38 megabytes. Time elapsed: 0:05:19.619000
DEBUG: Sent 39 megabytes. Time elapsed: 0:05:44.124000
DEBUG: Sent 40 megabytes. Time elapsed: 0:06:08.610000
DEBUG: Sent 41 megabytes. Time elapsed: 0:06:33.298000
DEBUG: Sent 42 megabytes. Time elapsed: 0:06:58.064000
Original comment by michael....@gmail.com
on 9 Feb 2016 at 9:47
FYI, I submitted the following pull request to Docker which will speed up
transferring the tar ball.
Original comment by michael....@gmail.com
on 9 Feb 2016 at 10:26
https://github.com/docker/docker-py/pull/930
Original comment by michael....@gmail.com
on 9 Feb 2016 at 10:26
I am having the same issues with a node.js app. I was able to deploy without
any problem when the size of all files where about 1 MB. Now I added assets and
reached 166MB. The deployment fails 100% of the time. I have the log files I
can send for investigation.
Original comment by pie...@kywix.com
on 16 Feb 2016 at 4:09
Thanks michael.
I can confirm that deploy worked fine once I updated .dockerignore to exclude
directories with large files, eg. env,venv, etc.
Original comment by anuj.lut...@redbubble.com
on 12 Mar 2016 at 3:50
So, is this issue resolved? Can anyone comment on whether they are still seeing
this issue?
Original comment by pay...@google.com
on 4 Apr 2016 at 8:16
Not really resolved, we just found a work-around (i.e., make sure your
.dockerignore file ignores all unnecessary items, see above for more details).
I do think there are some bugs here:
1 - The TAR upload is excruciatingly slow, well less than 1.5mbps. There's
something wrong here, my network is faster than that.
2 - The TAR files should be gzipped before being sent. It would save
significantly on time.
3 - There seems to be a timeout on the GCE side that shuts down the connection
early.
4 - There should be some logging output on the status of the upload, it appears
to hang.
5 - The Managed VM documentation should indicate that .dockerignore should be
used to ignore files for the upload process. The "skip_files" listed in the app
engine app.yaml is not used.
Original comment by michael....@gmail.com
on 5 Apr 2016 at 1:32
Thanks for clarifying all of these related issues, @michael.sander. Do you
think that you could potentially produce a reproducing example app which is
stripped of any sensitive code of your own organization, which can be used to
observe this behaviour? Being able to run that under a microscope and look into
all these related issues you've brought up would really help ensure that each
of them gets fairly and adequately investigated to yield a solution if it is as
you've surmised.
As a general note, thanks for your thorough reporting in this thread, it's an
excellent example of thorough issue reporting.
Original comment by pay...@google.com
on 5 Apr 2016 at 8:49
This is what you would need to do it:
- Create a sample managed VM app (any app will do)
- Create many thousands of files with random data in them in the same
directory. These files will simulate temp files, or files that you usually
would not want on your managed VM app.
- Delete any .dockerignore file, if one exists.
- Deploy
That should reproduce the issues discussed above.
Original comment by michael....@gmail.com
on 6 Apr 2016 at 12:55
Thanks @michael.sanders for the concise and clear points/instructions.
Unfortunately, we cannot guarantee network bandwidth for remote builds. Though
not favorable, it's not entirely clear that 1.5 mbps is unusually or
consistently slow. In testing, I could not reliably get the speed to remain
that low. I have had some luck when encountering a slow upload by canceling and
redeploying.
The remaining points however are interesting points towards improving the
platform. For optimal treatment and traction, I would suggest a few of these be
filed as their own issues[1]:
- TAR files could indeed be gzipped before being sent to save bandwidth and
time when deploying an application in Flex environment
- Specify in Flexible Environment documentation[2] that skip_files is ignored
and .dockerignore should be used or add a feature that creates a .dockerignore
file from the skip_files element of the app.yaml
Given the original subject of this thread, I would recommend that the logging
of object uploads in debug mode for remote builds be the feature request of
this thread.
[1]:
https://code.google.com/p/google-cloud-sdk/issues/entry?template=Defect%20report
%20from%20developer
[2]: https://cloud.google.com/appengine/docs/flexible/
Original comment by ngeli...@google.com
on 8 Apr 2016 at 7:49
The change to gzip the tarball is straightforward. I already submitted a Docker
pull request to ensure that the TAR balls are gzipped:
https://github.com/docker/docker-py/pull/930
The request was merged, but the maintainers did not turn on gzipping by
default, it's a parameter which defaults to off. Accordingly, you'll need to
make a change, but it's only a single line of code.
In the following file:
Cloud SDK\google-cloud-sdk\lib\googlecloudsdk\api_lib\app\docker_image.py
Around line 108, you'll need to change this:
build_res = docker_client.build(
path=self._dockerfile_dir, tag=self.tag, quiet=False, fileobj=None,
nocache=self._nocache, rm=self._rm, pull=False)
To this:
build_res = docker_client.build(
path=self._dockerfile_dir, tag=self.tag, quiet=False, fileobj=None,
nocache=self._nocache, rm=self._rm, pull=False, gzip=True)
Original comment by michael....@docketalarm.com
on 9 Apr 2016 at 6:31
Thanks Michael. We'll look into making this change.
Quick question: have you tried with `gcloud config set use_cloud_build true`?
Does this setting work for you?
Original comment by z...@google.com
on 11 Apr 2016 at 6:16
Yes, that is what I've been using the entire time.
gcloud config list
Your active configuration is: [default]
[app]
promote_by_default = False
suppress_change_warning = true
use_cloud_build = True
[compute]
region = us-central1
zone = us-central1-f
[core]
[REDACTED]
Original comment by michael....@docketalarm.com
on 12 Apr 2016 at 7:28
Original issue reported on code.google.com by
michael....@gmail.com
on 4 Jan 2016 at 8:57