pygmystack / pygmy

the pygmy stack is a container stack for local development
MIT License
25 stars 13 forks source link

Investigation of stack trace: too many open files #442

Open fubarhouse opened 1 year ago

fubarhouse commented 1 year ago

Leaving this one here for investigation later.

goroutine 1 [running]:
github.com/docker/docker/client.(*Client).getAPIPath(0x0, {0x0, 0x0}, {0x100d83502, 0x10}, 0x14000839de8)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/client.go:188 +0x28
github.com/docker/docker/client.(*Client).sendRequest(0x0, {0x0, 0x0}, {0x100d7e09d, 0x3}, {0x100d83502, 0x10}, 0x14000839de8, {0x0, 0x0}, ...)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/request.go:116 +0x5c
github.com/docker/docker/client.(*Client).get(...)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/request.go:36
github.com/docker/docker/client.(*Client).ContainerList(0x0, {0x0, 0x0}, {0x0, 0x1, 0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/container_list.go:48 +0x570
github.com/pygmystack/pygmy/service/interface/docker.DockerContainerList()
    /home/runner/work/pygmy/pygmy/service/interface/docker/docker.go:53 +0xd0
github.com/pygmystack/pygmy/service/interface.(*Service).GetRunning(0x1400083ac68)
    /home/runner/work/pygmy/pygmy/service/interface/interface.go:161 +0x3c
github.com/pygmystack/pygmy/service/interface.(*Service).GetFieldString(0x1400083ac68, {0x100d7e57a, 0x4})
    /home/runner/work/pygmy/pygmy/service/interface/field.go:33 +0xac
github.com/pygmystack/pygmy/service/library.Up({{0x1400030d500, 0x1, 0x1}, {0x0, 0x0}, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
    /home/runner/work/pygmy/pygmy/service/library/up.go:138 +0x9c8
github.com/pygmystack/pygmy/cmd.glob..func10(0x101319500, {0x1013549c8, 0x0, 0x0})
    /home/runner/work/pygmy/pygmy/cmd/up.go:69 +0x180
github.com/spf13/cobra.(*Command).execute(0x101319500, {0x1013549c8, 0x0, 0x0})
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x640
github.com/spf13/cobra.(*Command).ExecuteC(0x10131ab80)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x404
github.com/spf13/cobra.(*Command).Execute(...)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
github.com/pygmystack/pygmy/cmd.Execute()
    /home/runner/work/pygmy/pygmy/cmd/root.go:58 +0x30
main.main()
    /home/runner/work/pygmy/pygmy/main.go:28 +0x20
fubarhouse commented 1 year ago

The initial investigation appears to indicate an upstream issue in Docker. I will be merging the docker-related PRs and then re-evaluate.

iijiang commented 1 year ago

I got the same issue on M2 MAX. Is that any solution?

christopher-hopper commented 1 year ago

Does look like a Docker bug rather than a Pygmy bug. Sharing details here to help track. Will look for issues in the Docker queues.

System information

Hardware details

Apple MacBook Pro

Operating System details

Apple macOS

Software details

Docker Desktop for Mac

Error details

I find this error only occurs for me on a second start, or restart, of pygmy. If I cold boot into macOS the error doesn't appear on the first use of pygmy. It occurs for me on a restart of pygmy or if I stop and start pygmy.

Steps to reproduce:

  1. Cold boot into macOS
  2. Open a terminal
  3. Start pygmy as normal with pygmy up (no errors)
  4. Restart pygmy with pygmy restart or pygmy down; pygmy up;
fcntl /Users/chopper/.docker/contexts/meta: too many open files
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x71 pc=0x104fa5a58]

goroutine 1 [running]:
github.com/docker/docker/client.(*Client).getAPIPath(0x0, {0x0, 0x0}, {0x105113502, 0x10}, 0x14000a896f8)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/client.go:188 +0x28
github.com/docker/docker/client.(*Client).sendRequest(0x0, {0x0, 0x0}, {0x10510e09d, 0x3}, {0x105113502, 0x10}, 0x14000a896f8, {0x0, 0x0}, ...)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/request.go:116 +0x5c
github.com/docker/docker/client.(*Client).get(...)
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/request.go:36
github.com/docker/docker/client.(*Client).ContainerList(0x0, {0x0, 0x0}, {0x0, 0x1, 0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
    /home/runner/go/pkg/mod/github.com/docker/docker@v23.0.1+incompatible/client/container_list.go:48 +0x570
github.com/pygmystack/pygmy/service/interface/docker.DockerContainerList()
    /home/runner/work/pygmy/pygmy/service/interface/docker/docker.go:53 +0xd0
github.com/pygmystack/pygmy/service/interface.(*Service).GetRunning(0x14000a89f58)
    /home/runner/work/pygmy/pygmy/service/interface/interface.go:161 +0x3c
github.com/pygmystack/pygmy/service/interface.(*Service).GetFieldString(0x14000a89f58, {0x10510f7e3, 0x7})
    /home/runner/work/pygmy/pygmy/service/interface/field.go:33 +0xac
github.com/pygmystack/pygmy/service/library.SshKeyAdd({{0x1400029d500, 0x1, 0x1}, {0x1051138f2, 0x10}, 0x1400030b6e0, {0x14000302000, 0x5, 0x5}, 0x14000396ff0, ...}, ...)
    /home/runner/work/pygmy/pygmy/service/library/sshkeyadd.go:30 +0x1e8
github.com/pygmystack/pygmy/service/library.Up({{0x1400029d500, 0x1, 0x1}, {0x0, 0x0}, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
    /home/runner/work/pygmy/pygmy/service/library/up.go:131 +0x128c
github.com/pygmystack/pygmy/cmd.glob..func10(0x1056a9500, {0x1056e49c8, 0x0, 0x0})
    /home/runner/work/pygmy/pygmy/cmd/up.go:69 +0x180
github.com/spf13/cobra.(*Command).execute(0x1056a9500, {0x1056e49c8, 0x0, 0x0})
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x640
github.com/spf13/cobra.(*Command).ExecuteC(0x1056aab80)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x404
github.com/spf13/cobra.(*Command).Execute(...)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
github.com/pygmystack/pygmy/cmd.Execute()
    /home/runner/work/pygmy/pygmy/cmd/root.go:58 +0x30
main.main()
    /home/runner/work/pygmy/pygmy/main.go:28 +0x20
fubarhouse commented 1 year ago

I'd be very interested in anything you find @christopher-hopper .

I don't have M1/M2 hardware I can test directly on it, but I did merge a dependency update fixing an issue which might be related to when I opened this one. It might be helpful to try to compile and run it - I'd be happy to cut a release if that's the case.

christopher-hopper commented 1 year ago

Okay, found the cause and a solution to this error.

On macOS Darwin (and BSD and Linux) there are limits placed on the number of file descriptors that can be opened at once. The error we're experiencing occurs when the soft limit on max open files is reached by the command running in the shell.

You can see the max open file descriptors limit in your shell by running:

ulimit -aS

On macOS Monterey the default limit is 256. This must be not enough for Docker when we use pygmy and docker compose.

Steps to fix on macOS

To increase the max open files limit on macOS, so that it persists, you need to create a LaunchDaemon property list (plist) file. The limit set in the plist file for the LaunchDaemon will then be used by all new running shells.

  1. As root, create a file at /Library/LaunchDaemons/limit.maxfiles.plist

    sudo touch /Library/LaunchDaemons/limit.maxfiles.plist
  2. Open the file in an editor

    sudo vim /Library/LaunchDaemons/limit.maxfiles.plist 
  3. Copy paste this LaunchDaemon property list into the file
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
    "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
        <key>Label</key>
        <string>limit.maxfiles</string>
        <key>ProgramArguments</key>
        <array>
            <string>launchctl</string>
            <string>limit</string>
            <string>maxfiles</string>
            <string>61440</string>
            <string>524288</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>ServiceIPC</key>
        <false/>
    </dict>
    </plist>
  4. Save the file and confirm it is owned by root:wheel
  5. Load the LaunchDaemon service with the new maxfiles limit values

    sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
  6. Confirm the LaunchDaemon limits are now set

    launchctl limit
  7. Close and reopen your terminal shell
  8. Confirm the new limits have persisted in your shell

    ulimit -aS

The above file will configure the system LaunchDaemon maxfiles limit as 61,440 (soft limit) 524,288 (hard limit). This hard limit is equivalent to unlimited on macOS Darwin, which is the default.

Now when we run our pygmy up or pygmy restart commands we no longer hit the "Too many open files" errors.

References

fubarhouse commented 1 year ago

Unfortunate there's no globally scoped solution to this... thanks for the run-down. I suppose this is just one of the nuances of MacOS... 😥

iijiang commented 1 year ago

This error comes and goes, somehow once I restart the docker desktop, everything works. here is my current, my default file descriptors should be enough tho. ulimit -sS -t: cpu time (seconds) unlimited

-f: file size (blocks) unlimited

-d: data seg size (kbytes) unlimited

-s: stack size (kbytes) 8176

-c: core file size (blocks) 0

-v: address space (kbytes) unlimited

-l: locked-in-memory size (kbytes) unlimited

-u: processes 5333

-n: file descriptors 1048575

simesy commented 1 year ago

Solved this by a restart, but next time I'll run running this first: ulimit -n 10240 before rerunning in the same session, or running in a new shell window. The 256 limit has never been hit by me for anything I run, and I have a lot of apps open, and this has only ever happened for this one tiny app (it seems).

simesy commented 1 year ago

Happened to me today on Sonoma. I then just stopped and started pygmy with no issue.