kurtosis-tech / kurtosis

A platform for packaging and launching ephemeral backend stacks with a focus on approachability for the average developer.
https://docs.kurtosistech.com/
Apache License 2.0
353 stars 52 forks source link

Logs aggregator dies/is stopped #1832

Open mieubrisse opened 10 months ago

mieubrisse commented 10 months ago

What's your CLI version?

0.85.29

Description & steps to reproduce

I tried a kurtosis run today, and got this:

INFO[2023-11-19T13:17:16-03:00] No Kurtosis engine was found; attempting to start one...
INFO[2023-11-19T13:17:16-03:00] Starting the centralized logs components...
INFO[2023-11-19T13:17:16-03:00] Centralized logs components started.
INFO[2023-11-19T13:17:17-03:00] Successfully started Kurtosis engine
INFO[2023-11-19T13:17:17-03:00] Creating a new enclave for Starlark to run inside...
Error:  An error occurred running command 'run'
  Caused by: An error occurred calling the run function for command 'run'
  Caused by: An error occurred getting the enclave context for enclave ''
  Caused by: Unable to create new enclave with name ''
  Caused by: An error occurred creating an enclave with name ''
  Caused by: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR

I dumped the engine logs, and found:

2023-11-19 13:17:17 WARN[2023-11-19T16:17:17Z][docker_kurtosis_backend.go:CreateLogsCollectorForEnclave] Logs aggregator exists but is not running. Instead container status is 'STOPPED'. This is unexpected as docker should have restarted the container automatically. 
2023-11-19 13:17:17 WARN[2023-11-19T16:17:17Z][docker_kurtosis_backend.go:CreateLogsCollectorForEnclave] This can be fixed by restarting the engine using `kurtosis engine restart` and attempting to create the enclave again. 
2023-11-19 13:17:17 2023/11/19 16:17:17 http2: panic serving 172.17.0.1:36748: Propagate must be provided with a cause
2023-11-19 13:17:17 goroutine 24 [running]:
2023-11-19 13:17:17 golang.org/x/net/http2.(*serverConn).runHandler.func1()
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/golang.org/x/net@v0.17.0/http2/server.go:2361 +0x140
2023-11-19 13:17:17 panic({0x135bb60, 0x1a0e0c0})
2023-11-19 13:17:17     /usr/local/go/src/runtime/panic.go:884 +0x20c
2023-11-19 13:17:17 github.com/kurtosis-tech/stacktrace.Propagate(...)
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/github.com/kurtosis-tech/stacktrace@v0.0.0-20211028211901-1c67a77b5409/stacktrace.go:85
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/container-engine-lib/lib/backend_impls/docker/docker_kurtosis_backend.(*DockerKurtosisBackend).CreateLogsCollectorForEnclave(0x4000098d80, {0x1a30d50, 0x4000556100}, {0x40004950e0, 0x20}, 0x2?, 0x0?)
2023-11-19 13:17:17     /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_kurtosis_backend/docker_kurtosis_backend.go:424 +0x238
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/container-engine-lib/lib/backend_impls/docker/docker_kurtosis_backend.(*DockerKurtosisBackend).CreateEnclave(0x4000098d80, {0x1a30d50, 0x4000556100}, {0x40004950e0, 0x20}, {0x40001455b0, 0xe})
2023-11-19 13:17:17     /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_kurtosis_backend/docker_kurtosis_backend_enclave_functions.go:152 +0xc4c
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/container-engine-lib/lib/backend_impls/metrics_reporting.(*MetricsReportingKurtosisBackend).CreateEnclave(0x0?, {0x1a30d50?, 0x4000556100?}, {0x40004950e0, 0x20}, {0x40001455b0?, 0x40004bac60?})
2023-11-19 13:17:17     /home/circleci/project/container-engine-lib/lib/backend_impls/metrics_reporting/metrics_reporting_kurtosis_backend.go:114 +0x3c
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/engine/server/engine/enclave_manager.(*EnclaveCreator).CreateEnclave(0x40001c2800, {0x1a30d50, 0x4000556100}, {0x0, 0x0}, 0x0?, {0x40001455b0, 0xe}, {0x4000331096, 0x2}, ...)
2023-11-19 13:17:17     /home/circleci/project/engine/server/engine/enclave_manager/enclave_creator.go:57 +0xc4
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/engine/server/engine/enclave_manager.(*EnclaveManager).CreateEnclave(0x4000149900, {0x1a30d50, 0x4000556100}, {0x4000331089, 0x7}, {0x0, 0x0}, 0x0?, {0x0, 0x0}, ...)
2023-11-19 13:17:17     /home/circleci/project/engine/server/engine/enclave_manager/enclave_manager.go:180 +0x380
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/engine/server/engine/server.(*EngineConnectServerService).CreateEnclave(0x4000132620, {0x1a30d50, 0x4000556100}, 0x22ef8?)
2023-11-19 13:17:17     /home/circleci/project/engine/server/engine/server/engine_connect_server_service.go:103 +0x1c0
2023-11-19 13:17:17 connectrpc.com/connect.NewUnaryHandler[...].func1({0x1a35830, 0x4000132e70})
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/connectrpc.com/connect@v1.11.1/handler.go:52 +0x14c
2023-11-19 13:17:17 connectrpc.com/connect.NewUnaryHandler[...].func2({0xffff831585f8, 0x400054a040})
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/connectrpc.com/connect@v1.11.1/handler.go:81 +0x294
2023-11-19 13:17:17 connectrpc.com/connect.(*Handler).ServeHTTP(0x40001c1080, {0x1a24040, 0x4000550018}, 0x400055a200)
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/connectrpc.com/connect@v1.11.1/handler.go:239 +0x628
2023-11-19 13:17:17 github.com/kurtosis-tech/kurtosis/api/golang/engine/kurtosis_engine_rpc_api_bindings/kurtosis_engine_rpc_api_bindingsconnect.NewEngineServiceHandler.func1({0x1a24040, 0x4000550018}, 0x400055a200)
2023-11-19 13:17:17     /home/circleci/project/api/golang/engine/kurtosis_engine_rpc_api_bindings/kurtosis_engine_rpc_api_bindingsconnect/engine_service.connect.go:269 +0x220
2023-11-19 13:17:17 net/http.HandlerFunc.ServeHTTP(0x4000501860?, {0x1a24040?, 0x4000550018?}, 0x0?)
2023-11-19 13:17:17     /usr/local/go/src/net/http/server.go:2109 +0x38
2023-11-19 13:17:17 net/http.(*ServeMux).ServeHTTP(0x40005019a8?, {0x1a24040, 0x4000550018}, 0x400055a200)
2023-11-19 13:17:17     /usr/local/go/src/net/http/server.go:2487 +0x140
2023-11-19 13:17:17 golang.org/x/net/http2.(*serverConn).runHandler(0x40001f00a0?, 0x1a30df8?, 0x40004a4150?, 0x18694a0?)
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/golang.org/x/net@v0.17.0/http2/server.go:2368 +0xc4
2023-11-19 13:17:17 created by golang.org/x/net/http2.(*serverConn).scheduleHandler
2023-11-19 13:17:17     /home/circleci/go/pkg/mod/golang.org/x/net@v0.17.0/http2/server.go:2303 +0x22c

I checked, and my logs aggregator was indeed stopped, though I'm not sure why:

Screen Shot 2023-11-19 at 13 19 54

I also have a logs-collector somehow hanging around, even though I don't have any enclaves running:

$ kurtosis enclave ls
UUID   Name   Status   Creation Time

This is the output of docker container inspect kurtosis-logs-aggregator:

[
    {
        "Id": "ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a",
        "Created": "2023-11-17T08:54:59.858197634Z",
        "Path": "/bin/sh",
        "Args": [
            "-c",
            "printf '\n[sources.\"fluent_bit\"]\ntype = \"fluent\"\naddress = \"0.0.0.0:9714\"\n\n[sinks.uuid_file]\ntype = \"file\"\ninputs = [\"fluent_bit\"]\npath = \"/var/log/kurtosis/%%Y/%%V/{{ enclave_uuid }}/{{ service_uuid }}.json\"\t\nencoding.codec = \"json\"\nbuffer.when_full = \"block\"\n' \u003e /etc/vector/vector.toml \u0026\u0026 /usr/bin/vector -c=/etc/vector/vector.toml"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2023-11-17T08:55:00.116585384Z",
            "FinishedAt": "2023-11-18T15:36:40.796399588Z"
        },
        "Image": "sha256:ec83552fc89eacc0af3a4d1576523a74f8b31b5931aa560b90423dcd3c6e1355",
        "ResolvConfPath": "/var/lib/docker/containers/ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a/hostname",
        "HostsPath": "/var/lib/docker/containers/ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a/hosts",
        "LogPath": "/var/lib/docker/containers/ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a/ebaa0ce2ae972c613cfe03403cae2040ef4f317968ca5b2faacd3c3ff23f961a-json.log",
        "Name": "/kurtosis-logs-aggregator",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "kurtosis-logs-storage:/var/log/kurtosis/"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "on-failure",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "ConsoleSize": [
                0,
                0
            ],
            "CapAdd": [],
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": [],
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ],
            "Init": false
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/eeaa3853b171c6f86b7b1fa3c1893bb5d21e9ca5ee72fc9dd4d6bf7ab9a1f84a-init/diff:/var/lib/docker/overlay2/73a73c54f4733e3093b694a5477e7d1cc8ec08b38ec33ffc6cadaff24554b3f4/diff:/var/lib/docker/overlay2/0649dd28395f8fcd43911f7fe784e1424786f892c5d3ec21cd33d75b51b491e5/diff:/var/lib/docker/overlay2/c1f399053d8ca7a31e3b1156ab6b8115ded5d353a375c601927295de44fe64ae/diff:/var/lib/docker/overlay2/bcb7c67adf3f86995fddd2b45f36b34a2bc43595fe3bb7e48c34f96c4d4bd077/diff:/var/lib/docker/overlay2/fcd0f8d8397e8d2d3653911c289eda57bee2a2bd7222539e2da5a76a06a4a808/diff:/var/lib/docker/overlay2/9c880fe16116762d9d91885e8c6490bb21f350849b88079a22e2f587108762b5/diff:/var/lib/docker/overlay2/04b6c1d99cf7535684fc7333499d7adbc5e8d72401086835fc1c6715f0cc013d/diff",
                "MergedDir": "/var/lib/docker/overlay2/eeaa3853b171c6f86b7b1fa3c1893bb5d21e9ca5ee72fc9dd4d6bf7ab9a1f84a/merged",
                "UpperDir": "/var/lib/docker/overlay2/eeaa3853b171c6f86b7b1fa3c1893bb5d21e9ca5ee72fc9dd4d6bf7ab9a1f84a/diff",
                "WorkDir": "/var/lib/docker/overlay2/eeaa3853b171c6f86b7b1fa3c1893bb5d21e9ca5ee72fc9dd4d6bf7ab9a1f84a/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "kurtosis-logs-storage",
                "Source": "/var/lib/docker/volumes/kurtosis-logs-storage/_data",
                "Destination": "/var/log/kurtosis",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "ebaa0ce2ae97",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": true,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "-c",
                "printf '\n[sources.\"fluent_bit\"]\ntype = \"fluent\"\naddress = \"0.0.0.0:9714\"\n\n[sinks.uuid_file]\ntype = \"file\"\ninputs = [\"fluent_bit\"]\npath = \"/var/log/kurtosis/%%Y/%%V/{{ enclave_uuid }}/{{ service_uuid }}.json\"\t\nencoding.codec = \"json\"\nbuffer.when_full = \"block\"\n' \u003e /etc/vector/vector.toml \u0026\u0026 /usr/bin/vector -c=/etc/vector/vector.toml"
            ],
            "Image": "timberio/vector:0.31.0-debian",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": [
                "/bin/sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.kurtosistech.app-id": "kurtosis",
                "com.kurtosistech.container-type": "kurtosis-logs-aggregator"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "90b24aa1e00d5a26d6cf689794d149f8732b2ae474a7382f496eb65a8f79283d",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/90b24aa1e00d",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "8662e507c238504d517d3e5496325a3bdaea893a404df2b8d29602fdf219eea8",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]

And the container logs:

2023-11-17 05:55:00 2023-11-17T08:55:00.121833Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-11-17 05:55:00 2023-11-17T08:55:00.123136Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
2023-11-17 05:55:00 2023-11-17T08:55:00.125273Z  INFO vector::topology::running: Running healthchecks.
2023-11-17 05:55:00 2023-11-17T08:55:00.125364Z  INFO vector: Vector has started. debug="false" version="0.31.0" arch="aarch64" revision="0f13b22 2023-07-06 13:52:34.591204470"
2023-11-17 05:55:00 2023-11-17T08:55:00.125370Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2023-11-17 05:55:00 2023-11-17T08:55:00.125380Z  INFO vector::topology::builder: Healthcheck passed.
2023-11-17 05:55:00 2023-11-17T08:55:00.125612Z  INFO source{component_kind="source" component_id=fluent_bit component_type=fluent component_name=fluent_bit}: vector::sources::util::net::tcp: Listening. addr=0.0.0.0:9714

Desired behavior

I never get this error; Kurtosis correctly manages the logs-aggregator

What is the severity of this bug?

Critical; I am blocked and Kurtosis is unusable for me because of this bug.

What area of the product does this pertain to?

CLI: the Command Line Interface

mieubrisse commented 10 months ago

Confirmed that it's not a problem of switching contexts; I just switched to my cloud context and back again and the logs-aggregator was started along with the engine

Screen Shot 2023-11-19 at 13 25 18

(though these silly logs-collectors are still hanging around..)

tedim52 commented 10 months ago

status codes of the exited containers are 137. According to chat gpt: An exit status of 137 typically indicates that a process was terminated by a signal. In the context of Docker, specifically, this often happens when a container is killed due to exceeding its allocated memory limit.

I wonder if this is caused by the logs collector resource leak.

tedim52 commented 10 months ago

Update - this likely wasn't being caused by the log collector memory leak. Going to look deeper into memory limits on the logs aggregator.