Rancher Desktop Intermittently Hangs on Ventura 13.1

ryancurrah commented 1 year ago

Actual Behavior

When running a docker command it will hang forever. Any subsequent commands to docker in another shell hang as well. Rebooting the laptop is required as Rancher Desktop becomes unusable.

Steps to Reproduce

One dev on a M1 Mac, running Ventura 13.1 can reproduce this issue consistently by building a Dockerfile in docker. We however are unable to reproduce the same issue on our laptops consistently. One of our team members reproducing it is using a M1 Mac as well.

Create a Dockerfile

echo -e 'FROM alpine:latest\nRUN echo 'hey' > hey.txt' > Dockerfile

Build Dockerfile in docker

docker run --rm --interactive --pull="always" --user="root" --network="host" --name="repro-hanging-issue" --mount "type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock" -v "$(pwd):$(pwd)" -w "$(pwd)" docker:cli build .

Result

The terminal just hangs.

Expected Behavior

Docker commands not to hang.

Additional Information

We have had our developers start using Rancher Desktop in November 2022. It was working good, no hanging issues reported. Once people started updating to Ventura at the beginning of the month (January) they started reporting these issues. We have one developer who is able to consistently reproduce the issue, some of us can only reproduce it intermittently. Seems to be most reproducible on M1 Mac though. We were also able to reproduce it with our security tools disabled.

We enabled debug logging from the Rancher Desktop Troubleshooting page and looked at all the logs, lima and rancher and did not see any glaring errors or warnings.

If there is anything else we can provide to help this let me know.

Rancher Desktop Version

1.7.0

Rancher Desktop K8s Version

Disabled

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

Ventura 13.1

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

ryancurrah commented 1 year ago

Attached are the logs we captured when we re-produced the issue.

rancher-desktop-logs.zip

jandubois commented 1 year ago

I can't reproduce this on macOS 13.1 on M1 either). I've done a factory reset, rebooted the host, did another factory reset, and the command always worked fine.

I've looked at the logs, and can't spot anything in there either.

On the "reproducible laptop" does this also happen after a factory reset? Or after rebooting the host?

Are there any errors in any of the networking logs at ~/Library/Application Support/rancher-desktop/lima/_networks?

ryancurrah commented 1 year ago

I am getting our IT team to send me an M1 Macbook so I can try to reproduce this issue. Another dev reported the same issue this morning. Not sure what they were doing to cause it though.

On the "reproducible laptop" it happens even after a factory reset, reboot, and fresh re-install.

The dev with the reproducible laptop needs to get some work done so they have uninstalled it for now. ~I am going to get our devs to post here when they get a freezing issue~. Meanwhile, I will try to get that laptop and re-produce it.

jandubois commented 1 year ago

I am getting our IT team to send me an M1 Macbook so I can try to reproduce this issue. Another dev reported the same issue this morning. Not sure what they were doing to cause it though.

Thank you so much; this will be really helpful, as I've been unable to repro this myself.

Maybe also take a look at any anti-malware technology installed on your machines; maybe that is interfering with the virtualization code?

yagi2 commented 1 year ago

I have the same problem. I have tried a factory reset, reinstall, reboot everything, but rancher still hangs.

My colleagues who have the same anti-virus software installed did not have the problem.

lakamsani commented 1 year ago

Hi I 'm able to reproduce this frequently on my M1 running Monterrey 12.6.1/RD 1.7.0/k8s 1.25.4/Traefik disabled. What logs can I provide from ~/Library/Logs/rancher-desktop to help debug this? Currently the RD UI shows Kubernetes is running but kubectl commands timeout with Unable to connect to the server: net/http: TLS handshake timeout

Tried quitting Rancher desktop and restarting a couple of times but same problem. I could restart the laptop and the problem might go away. I may need to do that to not be blocked with my work and/or look to minikube (which doesn't have a nice UI). But happy to provide logs and keep the laptop in this reproducible state for the next 24 hours or so if it helps.

Screen Shot 2023-01-16 at 10 58 52 AM

lakamsani commented 1 year ago

tailed logs from the time it started to the time it stopped working.

1. steve.log

time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for rbac.authorization.k8s.io/v1, Kind=RoleBinding"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apiregistration.k8s.io/v1, Kind=APIService"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for /v1, Kind=Pod"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apps/v1, Kind=Deployment"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for events.k8s.io/v1, Kind=Event"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for /v1, Kind=PodTemplate"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apps/v1, Kind=StatefulSet"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for batch/v1, Kind=CronJob"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for acme.cert-manager.io/v1, Kind=Order"
…
….. first sign of trouble ….
….

2023-01-16T19:10:04.881Z: stderr: time="2023-01-16T11:10:04-08:00" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"

2023-01-16T19:13:01.329Z: stderr: W0116 11:13:01.327098   46860 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0116 11:13:01.327114   46860 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
….
…. many of these …..
….
W0116 11:13:01.328829   46860 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0116 11:13:01.328880   46860 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

….
…. TLS handshake timeouts. After this kubectl stops working roughly …..
….

2023-01-16T19:13:12.133Z: stderr: W0116 11:13:12.132748   46860 reflector.go:325] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/cert-manager.io/v1/certificates?resourceVersion=160294": net/http: TLS handshake timeout
W0116 11:13:12.132851   46860 reflector.go:325] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/node.k8s.io/v1/runtimeclasses?resourceVersion=160231": net/http: TLS handshake timeout
I0116 11:13:12.132905   46860 trace.go:205] Trace[631373749]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168 (16-Jan-2023 11:13:02.130) (total time: 10002ms):
Trace[631373749]: ---"Objects listed" error:Get "https://127.0.0.1:6443/apis/node.k8s.io/v1/runtimeclasses?resourceVersion=160231": net/http: TLS handshake timeout 10002ms (11:13:12.132)
Trace[631373749]: [10.002143209s] [10.002143209s] END

lakamsani commented 1 year ago

2. k3s.log

E0117 04:26:35.226050    4290 reflector.go:140] k8s.io/client-go@v1.25.4-k3s1/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
W0117 04:26:36.046392    4290 reflector.go:424] k8s.io/client-go@v1.25.4-k3s1/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
E0117 04:26:36.046516    4290 reflector.go:140] k8s.io/client-go@v1.25.4-k3s1/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
{"level":"warn","ts":"2023-01-17T04:26:36.183Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x400167d880/kine.sock","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
E0117 04:26:36.183408    4290 controller.go:187] failed to update lease, error: Put "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0117 04:26:36.183651    4290 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
E0117 04:26:36.185775    4290 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
I0117 04:26:36.185091    4290 trace.go:205] Trace[333656479]: "GuaranteedUpdate etcd3" audit-id:0a94d052-49c1-40c2-a1f3-8bdacccbd6e9,key:/leases/kube-node-lease/lima-rancher-desktop,type:*coordination.Lease (17-Jan-2023 04:26:26.184) (total time: 10000ms):
Trace[333656479]: ---"Txn call finished" err:context deadline exceeded 9999ms (04:26:36.185)
Trace[333656479]: [10.000193713s] [10.000193713s] END
E0117 04:26:36.197602    4290 finisher.go:175] FinishRequest: post-timeout activity - time-elapsed: 13.941958ms, panicked: false, err: context deadline exceeded, panic-reason: <nil>
E0117 04:26:36.196928    4290 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
I0117 04:26:36.199085    4290 trace.go:205] Trace[1183966381]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop,user-agent:k3s/v1.25.4+k3s1 (linux/arm64) kubernetes/0dc6333,audit-id:0a94d052-49c1-40c2-a1f3-8bdacccbd6e9,client:127.0.0.1,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (17-Jan-2023 04:26:26.183) (total time: 10015ms):
Trace[1183966381]: ---"Write to database call finished" len:509,err:Timeout: request did not complete within requested timeout - context deadline exceeded 9998ms (04:26:36.183)
Trace[1183966381]: [10.015928213s] [10.015928213s] END
E0117 04:26:36.199699    4290 timeout.go:141] post-timeout activity - time-elapsed: 16.136125ms, PUT "/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop" result: <nil>

ryancurrah commented 1 year ago

Note we have been able to avoid this hanging issue by switching to the 9p mount type in Lima. I'm not sure if it completely fixes it or makes it occur less often time will tell by our users. But my suggestion to others affected by this is to try the 9p mount. Caveat though the 9p mount does not support symlinks in volumes.

lakamsani commented 1 year ago

@ryancurrah how do you enable 9p? I read about it here i.e.

On macOS an alternative file sharing mechanism using 9p instead of reverse-sshfs has been implemented. It is disabled by default. Talk to us on Slack if you want to help us testing it.

But wasn't able to find the specific on how to enable it.

casheeeewnuts commented 1 year ago

I have the same problem.

In detail, I and co-worker had upgraded the macOS to 13.0 but become producing it. We upgrade to 13.1, his machine recovered, but my machine was not recovered.

Finally, I had recovered by switching mountType to the 9p

casheeeewnuts commented 1 year ago

Docker container had run normally with pure Lima that installed by homebrew. But mountType is null.

casheeeewnuts commented 1 year ago

@lakamsani edit this file and add entry the mountType to top-level ~/Library/Application Support/rancher-desktop/lima/_config/override.yaml

lifeym commented 1 year ago

I ran into the same issue too, when doing a "pnpm install" in a docker container after mounting a custom workdir into lima, on my macOS 13.1(intel). So I think this is not related to intel or M1. I can exactly reproduce this issue every time by using the same steps. And I also checked logs under rancher desktop, it seems no error(s) logged.

For me, it seems "hang" only occures when using default mountType(should be null, from ~/Library/Application Support/rancher-desktop/lima/0/lima.yaml), and run some npm install commadn inside a docker container with -v custom volmue mount. I also wrote a dockerfile to do almost the same thing to test but the problem disappered. Finally I changed lima mountType to 9p and everyting seems to be ok now.

mterzo commented 1 year ago

After upgrading to Ventura 13.2 coming from 12.x. I never ran into this problem on 12.x

I'm running into the same issue. I'm doing a massive amount of file activity along with network inside a container. The IO get's hung, which then docker ps becomes unresponsive. I try to quit the desktop which hangs, to get it to quit properly:

ps auxww |grep rancher | grep ssh  |awk '{print $2}'  | xargs kill

On restart, qemu looks like it comes up properly, but the docker socket is unresponsive still. A second quit and restart works fine. I guess I'll try the 9p thing. I don't have an override.yaml, so I'm assuming it should look like:

---
mountType: 9p

mterzo commented 1 year ago

---
mountType: 9p

Answered my own question:

cat ~/"Library/Application Support/rancher-desktop/lima/_config/override.yaml"
---
mountType: 9p

ps auxww |grep rancher | grep ssh shows nothing now while using disk io

atomicbeecz commented 1 year ago

Hello, experiencing same issue, but on intel CPU and macOS Ventura....FYI

mterzo commented 1 year ago

Hello, experiencing same issue, but on intel CPU and macOS Ventura....FYI

I should have clarified that, I’m on intel as well. The 9p made a huge difference.

atomicbeecz commented 1 year ago

Unfortunately for me the 9p caused other issues so it's unusable for me.

atomicbeecz commented 1 year ago

update: upgraded to Ventura 13.2 and don't have the "freezing" problem anymore without any override...

lynic commented 1 year ago

Meet the same hang problem on 13.2 on Intel mac, docker freezing, can't quick rancher-desktop.

mterzo commented 1 year ago

Meet the same hang problem on 13.2 on Intel mac, docker freezing, can't quick rancher-desktop.

I’m a terminal do a ps and grep for rancher. You will see a bunch of ssh sessions kill them off and your rancher will become responsive. Once made change to 9p all these hang issues went away.

lynic commented 1 year ago

I’m a terminal do a ps and grep for rancher. You will see a bunch of ssh sessions kill them off and your rancher will become responsive. Once made change to 9p all these hang issues went away.

Thanks, after adding a new override.yaml, it work for me!

cat ~/Library/Application\ Support/rancher-desktop/lima/_config/override.yaml
---
mountType: 9p

jonhutchens commented 1 year ago

I have been experiencing a similar problem on and off for the past month or two. Was originally discussing in the rancher-desktop slack channel, but after finding this issue I believe it's the same as what I'm experiencing.

I find the bug to be easily reproducible in my case: Rancher Desktop: 1.8.1 macOS: Ventura 13.1 Container runtime: dockerd (moby) [I have not tested recently with containerd/nerdctl - will try this] Rancher kubernetes: disabled (doesn't matter; I've seen this issue with k8s enabled as well)

I get the same behavior as described above, existing containers freeze and virtually all commands hang (docker ps, docker image ls, rdctl shell, nothing works except simple stuff like docker version).

Here is what I can note about reproducing the problem (at least in my case):

Only happens when running multiple containers simultaneously
Containers are running terraform provisioning via ansible (IO/network usage) in interactive mode (docker run -it) with a few env vars passed in (probably not relevant)
Each container has multiple volumes mounted, but I am careful to never mount the same host volume with read/write to two different containers (sometimes I mount the same volume to multiple containers in read-only)
I increased the RAM allowance for the rancher VM all the way up to 16GB, but this did not help (I have verified that my machine RAM is not being used up either; plenty of capacity left)

About the suggested workaround:

I did attempt the mountType: 9p workaround - it did successfully prevent the container runtime from hanging; however, it caused my terraform provider to fatally crash (everytime), so this method is unusable for me.

ricardochaves commented 1 year ago

Same here: Rancher Version: 1.9.1 Ventura 13.4.1 (c)

juanpalaciosascend commented 1 year ago

Likewise, Rancher Desktop randomly freezes for me, more often-than-not after I leave it running without use for a while, and most nerdctl nor rdctl commands will respond until I restart the application (tearing down the VM, etc.).

I'm currently on Rancher Desktop 1.9.1 & on macOS Ventura 13.5.1, running on Apple silicon (M2 Pro). I don't have Kubernetes enabled, and I'm using the containerd runtime, with VZ emulation (Rosetta support enabled) & virtiofs mounting (I did have other types of problems before when using 9p, mostly related to user mappings & permissions, so I'd like to avoid going back to that, and reverse-sshfs was unbearably slow!).

Let me know if you'd like me to gather any information when RD hangs, for debugging purposes. Thanks!

agascon commented 1 year ago

Same issue here. Exactly same environment as @juanpalaciosascend (but M1 pro)

seagullmouse commented 1 year ago

Same for me, factory reset did fix it for me though.

agascon commented 1 year ago

Factory reset fixes because it probably sets back to QEMU, reverse-sshfs, ... but if you try to apply those settings mentioned (VZ, virtiofs, ...) back, probably problem will come back.

juanpalaciosascend commented 1 year ago

I've seen most of the problems I've been experiencing go away... I want to say entirely, but it might be still a little bit too early for that, when switching back to the dockerd (moby) runtime, away from containerd.

All other settings (e.g. VZ framework, Rosetta support enabled, virtiofs volumes, Kubernetes disabled, etc.) remain the same, so that leads me to believe the problem that's causing Rancher Desktop to freeze revolves around the use of containerd.

Klarys commented 11 months ago

Same here

Rancher 1.10.0 M1 Ventura 13.5.2

asychev commented 11 months ago

same issue (1.10.0 / 13.5.2 / M1 Pro)

xlight05 commented 11 months ago

same issue here 1.10.0/ m1 pro/ sonoma 14.0

vladdnepr commented 10 months ago

same issue 1.10.0/ 13.5.1 / m1 pro

queses commented 10 months ago

same issue Ventura 13.6 / M1 Pro / 1.10.0 / VZ. However I issued the same problems in lima/colima, so the problem is not in Rancher itself

dev-hoyeon commented 10 months ago

same isssue 13.4.1 / M1 Pro / 1.11.0

chanstev commented 10 months ago

same issue with Ventura 13.5.2

marcindulak commented 8 months ago

rancher desktop 1.11.1, M1 Ventura 13.6.3

process kill suggested in https://github.com/rancher-sandbox/rancher-desktop/issues/3777#issuecomment-1428673201 helps so a newly started rancher desktop does not hang for some time (a few days).

After the rancher desktop start this is the disk usage, so the hang is probably not due to some limits exceeded.

docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          23        2         21.46GB   19.95GB (92%)
Containers      4         0         986B      986B (100%)
Local Volumes   4         2         0B        0B
Build Cache     1178      0         18.44GB   18.44GB

It seems like tracing docker with dtruss is not feasible without disabling SIP (system integrity protection) https://www.deepanseeralan.com/tech/fun-with-dtruss-macOS/

sudo dtruss $(which docker) system df
Password:
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to execute /Users/user/.rd/bin/docker: Operation not permitted

cp -v $(which docker) /tmp
/Users/user/.rd/bin/docker -> /tmp/docker

codesign --remove-signature /tmp/docker
codesign --display --verbose=4 /tmp/docker
/tmp/docker: code object is not signed at all

sudo dtruss /tmp/docker system df 
Password:
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to execute /tmp/docker: Could not create symbolicator for task

Maybe someone manages to trace docker to get some more information about the hang, otherwise I'm afraid we won't progress on the issue.

aequis commented 8 months ago

I've had Rancher Desktop 1.12.0 installed since yesterday and haven't encountered the issue again (on MacOS Ventura 13.6.3).

With 1.11.1, I was encountering this issue pretty much immediately when using VSCode dev containers and the only "fix" was setting the mountType to 9p, which broke dev containers in other ways and made them equally unusable.

marcindulak commented 7 months ago

I'm experiencing the hanging issue with 1.12.1.

marcindulak commented 7 months ago

Still an issue with rancher-desktop 1.12.2.

An additional information may be that hanging happens possibly more often when emulating amd64 using export DOCKER_DEFAULT_PLATFORM=linux/amd64

SergioVanacloigCarlsberg commented 6 months ago

Rancher 1.12.3, macOs Sonoma 14.3.1 and this is still hanging.

I already tried several configurations such as Emulation VZ enabling Rosetta support and Volume virtiofs, but no luck...

vaniiiiiiii commented 4 months ago

Any luck on this? Experienced it in Sonoma.

chathuranga95 commented 4 months ago

I am experiencing this as well. OS: Sonoma, Apple silicon. Rancher: 1.13.1

vaniiiiiiii commented 4 months ago

fixed By the way, I fixed mine by setting emulation to VZ in Sonoma. (forgot to post it 😅 )

chathuranga95 commented 4 months ago

@vaniiiiiiii 's fix worked for me as well.

ricardochaves commented 2 weeks ago

VZ worked for me. M1 Sonoma

jandubois commented 2 weeks ago

Let's keep this issue open for a little longer; if it doesn't work with QEMU then it is still a bug.

jaygalvin commented 1 week ago

I still experience this, though more intermittently due to my attempted workarounds. It feels like a memory issue because the repro is hard to predict. After a fresh restart of Rancher (i.e. rdctl shutdown, rdctl start) it seems to work fine, but after some indeterminate amount of time, it will hang again.

Rancher 1.15.1 Default Hardware config: 5 GB, 2 CPUs Emulation: VZ w/ Rosetta enabled Container Engine: dockerd (moby) M3 Mac, Sonoma 14.6.1

I regularly pull multi-layer images like https://hub.docker.com/r/rstudio/rstudio-workbench with the --platform linux/amd64 flag. I thought I found a workaround by adding "max-concurrent-downloads": 1 to /etc/docker/daemon.json via rdctl shell, but that eventually failed as well.

Repro:

I wrote this script to pull several images and then prune them.

#!/bin/bash
for version in 1.4.1717-3 \
               2021.09.0-351.pro6 \
               2021.09.1-372.pro1 \
               2021.09.2-382.pro1 \
               2022.02.0-443.pro2 \
               2022.02.1-461.pro1 \
               2022.02.2-485.pro2 \
               2022.02.3-492.pro3 \
               2022.07.0-548.pro5 \
               2022.07.1-554.pro3 \
               bionic-2022.07.2 \
               bionic-2022.12.0 \
               bionic-2023.03.0 \
               jammy-2023.03.2 \
               jammy-2023.03.1 \
               jammy-2023.06.2 \
               jammy-2023.06.1 \
               jammy-2023.06.0 \
               jammy-2023.09.1 \
               jammy-2023.09.0 \
               jammy-2023.12.1 \
               jammy-2023.12.0 \
               jammy-2024.04.2 \
               jammy-2024.04.1 \
               jammy-2024.04.0
do
   docker pull --platform linux/amd64 rstudio/rstudio-workbench:$version
done
docker image prune -af

(If this script succeeds, try waiting several hours before re-running. I had to wait a day for it to repro.)

Result:

When the script fails, it will display the following: Cannot connect to the Docker daemon at unix:///Users/jay/.rd/docker.sock. Is the docker daemon running? Subsequent docker commands will fail with the same message.

To recover Rancher, run rdctl shutdown, wait for it to quit entirely before running rdctl start (or opening via /Applications). (Note: Activity Monitor will show "Virtual Machine Service for limactl.ventura" has consumed all the memory allotted to it in the Hardware Configuration.)

marcindulak commented 1 week ago

The above Cannot connect to the Docker daemon at unix:///Users/user/.rd/docker.sock. Is the docker daemon running? may be a different issue.This one is about any docker command hanging. In my case even the "Quit Rancher Desktop" UI option was unresponsive - it would not quit the rancher desktop, I waited more than 5 minutes.

On my side, I've not experienced hanging since March (with QEMU), and realized I'm still using rancher desktop 1.12.3, with regularly up to date MacOS Sonoma (now 14.6.1). There were other problems with docker build, like Debian amd64 emulation being being very slow (next build taking 20 minutes compared to less than a minute on native aarch64), which made me eventually increase Virtual Machine -> Hardware -> Memory (GB) to 16.

Since then, the hanging has not reappeared yet, and the build takes only a few minutes. I kept other options as before, 4 CPUs, Volumes -> Mount Type -> reverse-sshfs, Emulation -> Virtual Machine Type -> QEMU, and Container Engine -> dockerd(moby).

rancher-sandbox / rancher-desktop