nodejs / docker-node

Official Docker Image for Node.js :whale: :turtle: :rocket:
https://hub.docker.com/_/node/
MIT License
8.24k stars 1.97k forks source link

npm hangs on linux/s390x containers #1973

Open hardillb opened 11 months ago

hardillb commented 11 months ago

Environment

Expected Behavior

npm install runs and packages are installed.

Current Behavior

Trying to build a container on the linux/s309x platform hangs running npm install with npm consuming 100% CPU.

Previous builds complete in less than 5mins, current build has been running for over an hour

We are building the https://github.com/node-red/node-red-docker container with

docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .

Possible Solution

Steps to Reproduce

Additional Information

Same thing is happening with 14-alpine and 16-alpine tags

I'm hitting this both locally and in a GH Action, both of which use Qemu to support building for alternate architectures.

tyranron commented 11 months ago

I have similar issue (see Dockerfile).

I wonder whether the problem of #1798 and #1829 finally snuck into 18 and earlier images.

sxa commented 11 months ago

Interesting. I've just fired up the docker image (node:16-alpine and node:18-alpine) on a real s390x system and npm seems to install without any problems. Which would lead us to perhaps something specific to qemu or the docker version in use (Mine is Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1)

sxa commented 11 months ago

Just tried with your dockerfile - went through without problems: build18.log.gz Command: docker build --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 . 2>&1 | tee build18.log

hardillb commented 11 months ago

Which does appear to point to this possibly being a qemu based problem. I know my laptop got a recent set of qemu packages, but not sure what would be needed to debug this. Any pointers would be helpful

tyranron commented 11 months ago

@hardillb setup-qemu-action uses onistiigi/binfmt Docker image for installing QEMU binaries. I think other versions like 6.1.0 or master could be tried to "resolve" this at least on GitHub Actions.

hardillb commented 11 months ago

master doesn't appear to fix it for me, testing 6.1.0

hardillb commented 11 months ago

no joy with qemu-v6.1.0 either so this may be a NodeJS + Qemu issue

hardillb commented 11 months ago

OK, while this appears to be limited to when running builds using qemu, this is going to be the default way 99% of CI builds run that target s390x, so I think we still need to track this down, even if it's just to raise a sensible upstream issue against qemu.

tyranron commented 11 months ago

@hardillb seems like after moby/buildkit#1516 we may omit using setup-qemu-action, because BuildKit supports QEMU emulation out-of-the-box. Even more, judging by onistiigi/binfmt Docker image tags, newer version of QEMU are released for buildkit- images only. The last one is 7.1.0.

However, for my repository the result is still the same, no matter which version is used: 6.0.0, 6.1.0, 6.2.0, 7.0.0, 7.1.0 or master.

tyranron commented 11 months ago

@hardillb in my case, the problem seems to be related to Linux only, somehow. I was able to resolve the issue just by switching to macos-latest runner for archs where the build stucks.

I will try this workaround for #1798 too, and will report the results.

hardillb commented 11 months ago

@tyranron did you get any joy using the docker.io/ prefix on the base containers?

If it is the qemu but https://gitlab.com/qemu-project/qemu/-/issues/1729 then hopefully it gets fixed soon.

tyranron commented 11 months ago

@hardillb

did you get any joy using the docker.io/ prefix on the base containers?

These are the same images, no?

I will try this workaround for https://github.com/nodejs/docker-node/issues/1798 too, and will report the results.

Building under macos-latest runner didn't work out for Node.js 20, but for 18 it fixed my problem.

hardillb commented 11 months ago

This may not be the same as the other qemu bug as it's not calling mremap.

I ran the following command:

docker run --platform linux/s390x -it --cap-add=SYS_PTRACE -e QEMU_STRACE=true -e QEMU_LOG_FILENAME=qemu.log -v ./qemu.log:/qemu.log --rm node:18-alpine npm install node-red:3.1.0

and got the following strace:

1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62cd8) = 0 ({tv_sec = 2223689,tv_nsec = 708242416})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708269945})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708530498})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708556031})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708593060})
1 munmap(0x00000040101ef000,57344) = 0
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b63028) = 0 ({tv_sec = 2223689,tv_nsec = 708653152})
1 socket(PF_NETLINK,SOCK_RAW|SOCK_CLOEXEC,NETLINK_ROUTE) = 24
1 sendto(24,275007277688,20,0,0,0) = 20
1 recvfrom(24,275007277688,8192,64,0,0) = 2880

qemu.log

hardillb commented 11 months ago

This looks to be spinning trying to receive data from the network. How do we move this forward?

tyranron commented 11 months ago

Due to https://github.com/tonistiigi/binfmt/pull/120 we have QEMU 8.0 in onistiigi/binfmt:master Docker image now. Tried it with node:21 Docker image, and still no luck.

felddy commented 10 months ago

I started seeing this issue on September 19th, 2023.

I created a repo to help diagnose the problem, or to detect when a fix is made upstream. It runs daily tests on two versions of node across six architectures on Debian and Alpine. It simply attempts npm -v.

On Nov 7: 4 of the 12 Alpine combinations are failing.

Daily test status:

See:

ozbillwang commented 10 months ago

report the similar in ticket #1946

hardillb commented 8 months ago

I've been playing with this again (as it's still a problem). I've been using AWS EC2 machines to try out a few different options.

whyour commented 8 months ago

I tried to run it on ubuntu-20.04 s390x and it works fine, but arm/v6 and arm/v7 still don't work, only alpine3.18 and nodejs18. https://github.com/whyour/qinglong/actions/runs/7375782137/job/20067750407

janvda commented 8 months ago

I tried to reproduce the problem on my macbook and it seems to be working for me: FYI this is what I get:

mac-jan:tmp jan$ git clone https://github.com/node-red/node-red-docker
Cloning into 'node-red-docker'...
remote: Enumerating objects: 3154, done.
remote: Counting objects: 100% (225/225), done.
remote: Compressing objects: 100% (107/107), done.
remote: Total 3154 (delta 133), reused 197 (delta 118), pack-reused 2929
Receiving objects: 100% (3154/3154), 823.97 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (1988/1988), done.
mac-jan:tmp jan$ ls
node-red-docker
mac-jan:tmp jan$ cd node-red-docker/
mac-jan:node-red-docker jan$ docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
[+] Building 331.6s (20/20) FINISHED                                                                                                                 docker-container:build
 => [internal] load build definition from Dockerfile.alpine                                                                                                            0.1s
 => => transferring dockerfile: 3.55kB                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/node:18-alpine                                                                                                      4.0s
 => [auth] library/node:pull token for registry-1.docker.io                                                                                                            0.0s
 => [internal] load .dockerignore                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                        0.0s
 => [base  1/11] FROM docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                        18.3s
 => => resolve docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                                0.0s
 => => sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61 449B / 449B                                                                             2.0s
 => => sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080 2.34MB / 2.34MB                                                                        18.0s
 => => sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643 41.11MB / 41.11MB                                                                      11.1s
 => => sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227 3.24MB / 3.24MB                                                                         3.3s
 => => extracting sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227                                                                              0.2s
 => => extracting sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643                                                                              2.3s
 => => extracting sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080                                                                              0.1s
 => => extracting sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61                                                                              0.0s
 => [internal] load build context                                                                                                                                      0.1s
 => => transferring context: 7.81kB                                                                                                                                    0.0s
 => [base  2/11] COPY .docker/scripts/*.sh /tmp/                                                                                                                       0.0s
 => [base  3/11] COPY .docker/healthcheck.js /                                                                                                                         0.0s
 => [base  4/11] RUN set -ex &&     apk add --no-cache         bash         tzdata         iputils         curl         nano         git         openssl         open  8.1s
 => [base  5/11] WORKDIR /usr/src/node-red                                                                                                                             0.0s 
 => [base  6/11] COPY .docker/known_hosts.sh .                                                                                                                         0.0s 
 => [base  7/11] RUN ./known_hosts.sh /etc/ssh/ssh_known_hosts && rm /usr/src/node-red/known_hosts.sh                                                                 71.6s 
 => [base  8/11] RUN echo "PubkeyAcceptedKeyTypes +ssh-rsa" >> /etc/ssh/ssh_config                                                                                     0.2s 
 => [base  9/11] COPY package.json .                                                                                                                                   0.0s 
 => [base 10/11] COPY flows.json /data                                                                                                                                 0.1s 
 => [base 11/11] COPY .docker/scripts/entrypoint.sh .                                                                                                                  0.1s 
 => [build 1/1] RUN apk add --no-cache --virtual buildtools build-base linux-headers udev python3 &&     npm install --unsafe-perm --no-update-notifier --no-audit   178.4s 
 => [release 1/3] COPY --from=build /usr/src/node-red/prod_node_modules ./node_modules                                                                                 0.8s 
 => [release 2/3] RUN chown -R node-red:root /usr/src/node-red &&     /tmp/install_devtools.sh &&     rm -r /tmp/*                                                    41.5s 
 => [release 3/3] RUN npm config set cache /data/.npm --global                                                                                                         7.3s 
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load                                                                                                                                                      
mac-jan:node-red-docker jan$

FYI My macbook docker setup:

1/ I have installed lima (so I don't use docker desktop)

# install lima
brew install lima

# create default lima instance with 6GB memory using docker template
limactl start --name=default --set='.cpus = 4 | .memory = "6GiB" | .disk = "100GiB" ' template://docker

# create docker context - note that the actual unix socket path is returned by the previous command.
docker context create colima --docker "host=unix:///Users/jan/.lima/default/sock/docker.sock"
colima"

# starts the docker environment on my macbook.
limactl start

2/ I have installed Docker Buildx as follows:

# in folder /Users/jan/.docker/cli-plugins
wget https://github.com/docker/buildx/releases/download/v0.10.3/buildx-v0.10.3.darwin-amd64
mv buildx-v0.10.3.darwin-amd64 docker-buildx
chmod a+x docker-buildx

Add binfmt_misc support for additional platforms as specified in https://docs.docker.com/build/building/multi-platform/

 docker run --privileged --rm tonistiigi/binfmt --install all
tyranron commented 8 months ago

With https://github.com/tonistiigi/binfmt/pull/144 (QEMU 8.1.4) and node:21 it still doesn't work for me on arm32v6, arm32v7 and s390x platforms. Tried building on both macos-latest and ubuntu-latest runners:

hardillb commented 8 months ago

Also the important place to test this is in AMD64 hardware as this needs to run on GH actions with the Ubuntu runner