lando / lando

A development tool for all your projects that is fast, easy, powerful and liberating
https://lando.dev
GNU General Public License v3.0
4.09k stars 545 forks source link

Hyperkit runs endlessly at max CPU due to user-perm-helpers.sh #2775

Open westonruter opened 3 years ago

westonruter commented 3 years ago

This may be related to #2103 or the underlying cause, but I wanted to open a new issue since that one has been open and stale for awhile.

I have a Lando config based on the wordpress recipe. On MacOS, when testing with the latest version of Lando (v3.0.24), as soon as I do lando start I'm seeing CPU usage for com.docker.hyperkit run at 300%+. In older versions (specifically v3.0.7 per https://github.com/lando/lando/issues/2103#issuecomment-721591928) I would see a 200% spike that would settle down after about 20 minutes. But this 300% CPU usage is persistent. I did some digging to try to find out what was going on. Here's what I found.

I ran docker stats which resulted in:

CONTAINER ID   NAME                                           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
097b965da18f   wordpressdev_appserver_nginx_1                 13.19%    81.08MiB / 1.942GiB   4.08%     13.7kB / 17.6kB   2.23MB / 311kB    10
f48619032642   wordpressdev_database_1                        13.65%    162MiB / 1.942GiB     8.15%     715kB / 410kB     623kB / 33.8MB    33
7b4c088c0d7c   wordpressdev_appserver_1                       12.18%    107.4MiB / 1.942GiB   5.40%     663kB / 1.47MB    17.8MB / 246kB    6
79efe142eb83   landoproxyhyperion5000gandalfedition_proxy_1   0.04%     13.86MiB / 1.942GiB   0.70%     2.01MB / 51.4kB   6.76MB / 8.19kB   13

Note the three containers for the Lando config each are consuming ~13% CPU when they are supposedly idle. I then ran docker exec {CONTAINER} ps -aux and found that the three containers each had a process that was consuming CPU TIME:

9513                root                0:30                find /app -not -user www-data -execdir chown www-data:www-data {} +
10154               root                0:34                find /app -not -user www-data -execdir chown www-data:www-data {} +
9292                root                0:34                find /app -not -user www-data -execdir chown www-data:www-data {} +

I then went into each container and kill'ed each of these find processes. As soon as I did so, CPU usage dropped dramatically:

Screen Shot 2020-12-25 at 15 12 57

The WordPress site continued to work as expected even with these processes being killed, though I'm guessing they'd start to misbehave once files are created.

Note that the directory tree for this Lando configuration contains quite a number of files, including a several node_modules and Composer vendor directories. In all, there are 1,482,718 files and directories which add up to 18GB. It's apparently (and understandably) taking too long for find to finish, so long that it never seems to end.

I believe the code in question here this line from perm_sweep() in plugins/lando-core/scripts/user-perm-helpers.sh:

https://github.com/lando/lando/blob/8487959ad943024fb8ac4099152edc295520eee1/plugins/lando-core/scripts/user-perm-helpers.sh#L91

What can be done to prevent this from happening?

westonruter commented 3 years ago

I checked in v3.0.7 and I'm seeing the same thing, except the wordpressdev_database_1 container is at 0% CPU:

CONTAINER ID        NAME                                           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
a7dedc18ca00        wordpressdev_appserver_nginx_1                 8.76%               133.5MiB / 1.944GiB   6.71%               72.8kB / 54.1kB     0B / 0B             9
9cd5b9bc82c2        wordpressdev_database_1                        0.08%               119.1MiB / 1.944GiB   5.98%               20.2MB / 588kB      0B / 0B             34
49624e279c9b        wordpressdev_appserver_1                       19.23%              202MiB / 1.944GiB     10.15%              869kB / 21MB        0B / 0B             6
4565382417ce        landoproxyhyperion5000gandalfedition_proxy_1   0.00%               14.99MiB / 1.944GiB   0.75%               1.96MB / 66.4kB     0B / 0B             13

When I examine the processes of the wordpressdev_appserver_nginx_1 and wordpressdev_appserver_1 containers, the same find command is consuming CPU (and causing the CPU fan to kick into high gear):

find /app -not -user www-data -execdir chown www-data:www-data {} +

The com.docker.hyperkit here is running at ~200% whereas in the latest Lando it was running at ~300%, and that's apparently due to the database container now this find command whereas previously it did not.

pirog commented 3 years ago

@westonruter thanks for this. We’ve also noticed this as a resource problem, although not on the same levels you report.

We have a plan to improve it although it’s probably not going to happen until the not officially announced yet Lando 4 since our solution requires some deep changes to the fundamentals. That said we will have a 4.x alpha in a few months :)

westonruter commented 3 years ago

I think I may have a workable workaround in the meantime. I can just move the wp-content directory out Lando root directory before running lando start, and then once the CPU settles down, the wp-content directory can be moved back since at that time perm_sweep() will have finished. I've tried it a couple times and it seems to work. I can do lando start and CPU finishes churning after just a minute in v3.0.7 rather than half an hour. I'll re-check this in the latest Lando once Docker Desktop has been updated to fix a separate issue (https://github.com/docker/for-mac/issues/5164) in which com.docker.backend takes up 100% CPU as well.

heymarkreeves commented 3 years ago

Referencing the linked Docker Desktop issue, I'm seeing com.docker.backend consistently running at 99% or 100% of the CPU, on macOS Catalina. I did try disabling gRPC FUSE as was recommended in that thread and that resolved the CPU usage issue, but even after running lando rebuild I was having issues with edits to files synching or not being recognized. Back to watching the CPU run hot and hoping the Docker fix promptly makes it into a Lando update.

CheckeredFlag commented 3 years ago

I don't see how this is related to Lando. Even after doing a lando poweroff and docker ps shows no containers running, high cpu usage remains. Restarting docker from the dropdown menu doesn't matter - only quitting docker and restarting clears things up...for a while. BTW, I'm seeing the problem with com.docker.hyperkit, not com.docker.backend.

This new issue is interesting.

heymarkreeves commented 3 years ago

After restarting my mac, I never saw com.docker.backend again and things have been running OK. I only chimed in on this issue as Lando advises keeping your version of Docker at the version they specify, and if there were fixes on the Docker side, we'd want to see the Docker version updated in the Lando installer.

farmerpaul commented 3 years ago

I find this to be an annoyingly persistent issue with the version of Docker that's bundled with Lando on MacOS, as I use Lando almost exclusively to manage a variety of WordPress projects. The workaround of turning off "Use gRPC FUSE for file sharing" is not an option for me, since that causes files to no longer sync when I make changes to code, making my container almost useless for development. Restarting Docker Desktop, to allow the CPU usage to calm down, does the trick for a while, but only until it starts happening again for whatever reason.

I've just ordered a development server that'll be running either Windows or Linux, which I'll be migrating my use of Lando onto, and since this appears to be a MacOS-only issue, perhaps that may be my ultimate workaround…

techieshark commented 3 years ago

I wonder if this line should be skippable via config, as a workaround here? (For those of us who have multiple gigs of files in the wp-content/uploads directory for example.)

https://github.com/lando/lando/blob/8487959ad943024fb8ac4099152edc295520eee1/plugins/lando-core/scripts/user-perm-helpers.sh#L91

I can just move the wp-content directory out Lando root directory before running lando start, and then once the CPU settles down, the wp-content directory can be moved back

@westonruter did you do these steps manually or develop some automated trick for that?

westonruter commented 3 years ago

@techieshark Here's the script I use: https://gist.github.com/westonruter/37e82bab7fa558e90b61fc832a711725

In my Lando environment (wordpressdev), I have a bin directory in the project root, so I put this script there. Then when I want to start I do ./bin/lando.sh start or to rebuild I do ./bin/lando.sh rebuild. Not elegant, but it works.

pirog commented 3 years ago

We've been actively planning Lando 4 and it will be moving to a permissions mapping system that no long requires this script. Recognize that's a more medium-long term solution to this but wanted to let you all know its something we plan on addressing.

johnrom commented 1 year ago

Any workarounds on this? Migrating from our old dev environment but this issue has the whole office's cpu fans spinning.

pirog commented 1 year ago

We will be releasing a preview of Lando 4 services in the next month or so that should resolve this