rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.72k stars 271 forks source link

Long wait and network errors on npm install in Rancher container #6365

Open hdodov opened 5 months ago

hdodov commented 5 months ago

Actual Behavior

Running npm ci with a large number of packages results in over 5 minutes of wait and EHOSTUNREACH or EAI_AGAIN errors.

Steps to Reproduce

  1. Clone the reproduction repo: https://github.com/hdodov/mcve-rancher-npm-hang
  2. Run npm run dev
  3. You should see one of the errors I got

Result

First attempt:

[+] Building 308.6s (8/9)                                                               docker:default
 => [payload internal] load build definition from Dockerfile                                      0.0s
 => => transferring dockerfile: 182B                                                              0.0s
 => [payload internal] load .dockerignore                                                         0.0s
 => => transferring context: 2B                                                                   0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                  2.6s
 => [payload internal] load build context                                                         0.0s
 => => transferring context: 72B                                                                  0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf085  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                             0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                 0.0s
 => CACHED [payload 4/5] COPY package.json .                                                      0.0s
 => [payload 5/5] RUN npm ci                                                                    306.0s
failed to solve: Unavailable: error reading from server: EOF

Second attempt:

[+] Building 624.4s (9/9) FINISHED                                                      docker:default
 => [payload internal] load .dockerignore                                                         0.0s
 => => transferring context: 2B                                                                   0.0s
 => [payload internal] load build definition from Dockerfile                                      0.0s
 => => transferring dockerfile: 182B                                                              0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                  3.4s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf085  0.0s
 => [payload internal] load build context                                                         0.0s
 => => transferring context: 72B                                                                  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                             0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                 0.0s
 => CACHED [payload 4/5] COPY package.json .                                                      0.0s
 => ERROR [payload 5/5] RUN npm ci                                                              558.7s
------                                                                                                 
 > [payload 5/5] RUN npm ci:                                                                           
549.4 npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
549.9 npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
558.5 npm notice 
558.5 npm notice New minor version of npm available! 10.2.3 -> 10.3.0
558.5 npm notice Changelog: <https://github.com/npm/cli/releases/tag/v10.3.0>
558.5 npm notice Run `npm install -g npm@10.3.0` to update!
558.5 npm notice 
558.5 npm ERR! code EAI_AGAIN
558.5 npm ERR! syscall getaddrinfo
558.5 npm ERR! errno EAI_AGAIN
558.5 npm ERR! request to https://registry.npmjs.org/test-exclude/-/test-exclude-6.0.0.tgz failed, reason: getaddrinfo EAI_AGAIN registry.npmjs.org

Here's an error I got when manually entering the container shell and running npm ci:

root@fe61caa23d72:/app# npm ci
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated @types/sharp@0.32.0: This is a stub types definition. sharp provides its own type definitions, so you do not need this installed.
npm notice 
npm notice New minor version of npm available! 10.2.3 -> 10.3.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.3.0
npm notice Run npm install -g npm@10.3.0 to update!
npm notice 
npm ERR! code EHOSTUNREACH
npm ERR! syscall connect
npm ERR! errno EHOSTUNREACH
npm ERR! request to https://registry.npmjs.org/async/-/async-2.6.4.tgz failed, reason: connect EHOSTUNREACH 104.16.25.34:443

This is the complete npm log of this run: npm-EHOSTUNREACH.log.

However, when I manually ran it later, it worked:

root@5e7e6a9d2f90:/app# npm ci
npm WARN deprecated @types/sharp@0.32.0: This is a stub types definition. sharp provides its own type definitions, so you do not need this installed.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.

added 1423 packages, and audited 1424 packages in 43s

224 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Expected Behavior

The whole container build should take about 50 seconds, as it does with OrbStack and Docker Desktop.

Additional Information

No response

Rancher Desktop Version

1.12.1

Rancher Desktop K8s Version

deactivated

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

macOS Sonoma 14.3

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

hdodov commented 5 months ago

This might be related to #6358, which also appears to occur due to network errors in the container.

jandubois commented 5 months ago

I can reproduce this behaviour with QEMU. I tried giving the VM more resources by running with 8 CPUs and 16GB of memory, but the result was unchanged.

It seems to work fine when using VZ emulation (and virtiofs for the volume sharing), but the build tooks still over 150s instead of the expected 50s:

$ npm run dev

> dev
> docker compose up --build

[+] Building 153.2s (11/11) FINISHED                                                                   docker:rancher-desktop
 => [payload internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                          0.0s
 => [payload internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 219B                                                                                     0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                                         0.9s
 => [payload auth] library/node:pull token for registry-1.docker.io                                                      0.0s
 => [payload internal] load build context                                                                                0.0s
 => => transferring context: 146B                                                                                        0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d36  0.0s
 => CACHED [payload 2/5] WORKDIR /app                                                                                    0.0s
 => CACHED [payload 3/5] COPY package-lock.json .                                                                        0.0s
 => CACHED [payload 4/5] COPY package.json .                                                                             0.0s
 => [payload 5/5] RUN npm ci                                                                                           149.7s
 => [payload] exporting to image                                                                                         2.6s
 => => exporting layers                                                                                                  2.6s
 => => writing image sha256:7dfde214f314799120a5e9894e9188db49ce7130b2f5077a070eab65ffc8e0cd                             0.0s
 => => naming to docker.io/library/mcve-rancher-npm-hang-payload                                                         0.0s
[+] Running 2/2
 ✔ Network mcve-rancher-npm-hang_default      Created                                                                    0.0s
 ✔ Container mcve-rancher-npm-hang-payload-1  Created                                                                    0.0s
Attaching to payload-1
jandubois commented 5 months ago

I can reproduce this behaviour with QEMU. I tried giving the VM more resources by running with 8 CPUs and 16GB of memory, but the result was unchanged.

I forgot to mention: the Docker socket forwarding seems to have stopped while running npm run dev with QEMU, so the result is essentially just a timeout after 5min.

I could still get a shell inside the VM, and the moby engine was still running, but the image had not yet been built.

jandubois commented 5 months ago

It looks like buildkit is not properly enabled by default in Rancher Desktop; enabling it explicitly in the compose file brings the build time down to 31s (after a factory reset, but again using VZ and virtiofs):

--- docker-compose.yml
+++ docker-compose.yml
@@ -3,5 +3,7 @@ services:
   payload:
     build:
       context: ./
+      args:
+        DOCKER_BUILDKIT: 1
     ports:
       - 3000:3000

and

$ npm run dev

> dev
> docker compose up --build

[+] Building 37.7s (11/11) FINISHED                                                                    docker:rancher-desktop
 => [payload internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                          0.0s
 => [payload internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 219B                                                                                     0.0s
 => [payload internal] load metadata for docker.io/library/node:18-bullseye-slim                                         1.6s
 => [payload auth] library/node:pull token for registry-1.docker.io                                                      0.0s
 => [payload internal] load build context                                                                                0.0s
 => => transferring context: 1.11MB                                                                                      0.0s
 => [payload 1/5] FROM docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d36  2.3s
 => => resolve docker.io/library/node:18-bullseye-slim@sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d3677cff575  0.0s
 => => sha256:5ae20c369ec182f5275f874054411f9bfb3f8132ff52a9c1d08d4ae15494d01c 30.06MB / 30.06MB                         0.8s
 => => sha256:01274fed796f4fcde21eb139c21923f0577bf03c6aca3de5fb6cca7997c026c7 4.19kB / 4.19kB                           0.2s
 => => sha256:66f24c6ac6abeb092db3801126aeedab6af5fb7289ad2c67fdf187e36b658e1d 38.67MB / 38.67MB                         0.7s
 => => sha256:f06f9ff9dcc5cfec936a7bf08580c5d9cf0f424fbbd857d3677cff575ff340db 1.21kB / 1.21kB                           0.0s
 => => sha256:9440185cd6578a314ca5d32b315bea09d55c2050442590ace64c121170c009e8 1.37kB / 1.37kB                           0.0s
 => => sha256:5b233b1f1a3b8a53bebaf24ca9d537508f23e4a1f5bf6d77e5aa334e8b42a7fc 7.64kB / 7.64kB                           0.0s
 => => sha256:e8d32e99fce4825fc4fa6badbf7514943681c2fc1bcdab3f10c0b3047541122b 2.70MB / 2.70MB                           0.5s
 => => sha256:1a8818020ea28916649820eb93d259bce8ee9113797d5d2dbbbc6378612fabe1 448B / 448B                               0.6s
 => => extracting sha256:5ae20c369ec182f5275f874054411f9bfb3f8132ff52a9c1d08d4ae15494d01c                                0.6s
 => => extracting sha256:01274fed796f4fcde21eb139c21923f0577bf03c6aca3de5fb6cca7997c026c7                                0.0s
 => => extracting sha256:66f24c6ac6abeb092db3801126aeedab6af5fb7289ad2c67fdf187e36b658e1d                                0.8s
 => => extracting sha256:e8d32e99fce4825fc4fa6badbf7514943681c2fc1bcdab3f10c0b3047541122b                                0.1s
 => => extracting sha256:1a8818020ea28916649820eb93d259bce8ee9113797d5d2dbbbc6378612fabe1                                0.0s
 => [payload 2/5] WORKDIR /app                                                                                           0.1s
 => [payload 3/5] COPY package-lock.json .                                                                               0.0s
 => [payload 4/5] COPY package.json .                                                                                    0.0s
 => [payload 5/5] RUN npm ci                                                                                            31.0s
 => [payload] exporting to image                                                                                         2.7s
 => => exporting layers                                                                                                  2.7s
 => => writing image sha256:7e939011152059397661a7baef53465e78b9132d2f0593b2852cecbee51da2cf                             0.0s
 => => naming to docker.io/library/mcve-rancher-npm-hang-payload                                                         0.0s
[+] Running 2/2
 ✔ Network mcve-rancher-npm-hang_default      Created                                                                    0.0s
 ✔ Container mcve-rancher-npm-hang-payload-1  Created                                                                    0.0s
Attaching to payload-1
^CGracefully stopping... (press Ctrl+C again to force)
jandubois commented 5 months ago

I did one more test with VZ emulation, but still using reverse-sshfs for the volume sharing. It worked the same way as using virtiofs, so the problem seems to be due to QEMU.

In summary there are 2 issues:

The workaround for now is to use VZ emulation and configure buildkit explicitly.

jandubois commented 5 months ago

Let's use #1541 to deal with the buildkit configuration and keep this issue for dealing with the broken docker.sock when using QEMU.

jandubois commented 5 months ago

It looks like buildkit is not properly enabled by default in Rancher Desktop;

This turned out to be wrong. The reason that buildkit was not being used was because I still had a custom builder installed from investigating a different issue, and that builder was also made the default:

$ docker buildx ls
NAME/NODE         DRIVER/ENDPOINT  STATUS  BUILDKIT        PLATFORMS
amd64builder *    docker-container
  amd64builder0   rancher-desktop  running v0.12.5         linux/arm64, linux/amd64, linux/amd64/v2
default           docker
  default         default          running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2
rancher-desktop   docker
  rancher-desktop rancher-desktop  running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2

The builder config is not deleted by a factory reset (I've created #6408 to change this), so it affected my testing for this issue as well by running the build inside a container.

After deleting the custom builder, the docker compose up --build worked normally and compled the npm ci command in under 40s.

Deleting the builder did not help with the QEMU issue though; it would still lose the socket connection and eventually time out.

thecopy commented 5 months ago

I am having the same issue. Using QEMU the my Docker crashes. Using VZ at least install finishes but it is taking a very long time. Running with npm install --loglevel verbose i can see that the HTTP GET calls are taking progresivelly longer ang longer (eventually each GET is more than 1 minute).

Eddcp commented 4 months ago

I'm having this issue using Mac Os M1 Sonoma 14.1.1. Rancher Desktop using QEMU virtualization used to work 😢

containeers commented 3 months ago

I'm having the same issue on Mac OS 14.4.1 with intel core. Earlier docker build with npm install used to work.

rchaudha commented 3 months ago

Is there any update here? We too are seeing users running into this issue where with Docker Desktop the builds would be 1/10 the time.

jandubois commented 3 months ago

Is there any update here?

No update, unfortunately. Nobody has had time to work on it yet.

nino-magazinovic commented 2 months ago

@jandubois Downgrading Rancher Desktop does not seem to resolve the issue. Actually after a factory reset, I encountered the same problem, which critically blocked my work and had to switch to Docker Desktop.

Is Rancher Desktop linked to a specific tested version of QEMU, or does it download the latest version during a factory reset?

Rancher desktop 1.13.1, Intel Mac i7, MacOs Monterey 12.7.4

jleigh100 commented 2 months ago

I was having this issue with running a container which used node 18. A workaround I have found is upgrading the node version in the container.

I have tested with Node version 16, 18, 20, 21 and 22 - I get the issue on node 18 and 21, and don't get the issue on 16, 20 and 22.

Rancher Desktop 1.13.1, 2021 M1 MacOs Sonoma 14.4.1

aamontree commented 2 weeks ago

I'm having a similar issue with Node 18; although, it is not consistently reproducible.

RUN npm install --legacy-peer-deps --verbose --force

I'm also seeing a potentially related issue during my Docker build where pip install --upgrade pip is hanging.

image

Using VZ for the Virtual Machine type instead of QEMU (with Rosetta support enabled) as well as building without Docker BuildKit DOCKER_BUILDKIT=0 are two workarounds that I have found to be effective.

Rancher Desktop Version

1.14.2

Rancher Desktop K8s Version

deactivated

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

macOS Sonoma 14.5

What CPU architecture are you using?

ARM64 (M1)

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response