kurtosis-tech / kurtosis

A platform for packaging and launching ephemeral backend stacks with a focus on approachability for the average developer.
https://docs.kurtosistech.com/
Apache License 2.0
362 stars 53 forks source link

no notification or warning when Docker engine is running low on disk space #1672

Open leeederek opened 11 months ago

leeederek commented 11 months ago

Background & motivation

When using Kurtosis locally, the size of my systems tend to be quite large (due to complexity of distributed systems) and when I unexpectedly hit resource limits, I get hard-to-debug errors like:

ERROR[10-27|23:31:21.451] Low disk space. Gracefully shutting down Geth to prevent database corruption. available=420.03MiB path=/execution-data/geth

and another example:

There was an error validating Starlark code
Error while validating instruction add_service(name="cl-04-lighthouse-geth", config=ServiceConfig(image="sigp/lighthouse:latest", ports={"http": PortSpec(number=4000, transport_protocol="TCP", application_protocol="http"), "metrics": PortSpec(number=5054, transport_protocol="TCP", application_protocol="http"), "tcp-discovery": PortSpec(number=9000, transport_protocol="TCP", application_protocol=""), "udp-discovery": PortSpec(number=9000, transport_protocol="UDP", application_protocol="")}, files={"/data": "babbling-sun"}, cmd=["lighthouse", "beacon_node", "--debug-level=info", "--datadir=/consensus-data", "--testnet-dir=/data/data/custom_config_data", "--disable-enr-auto-update", "--enr-address=KURTOSIS_IP_ADDR_PLACEHOLDER", "--enr-udp-port=9000", "--enr-tcp-port=9000", "--listen-address=0.0.0.0", "--port=9000", "--http", "--http-address=0.0.0.0", "--http-port=4000", "--http-allow-sync-stalled", "--slots-per-restore-point=32", "--disable-packet-filter", "--execution-endpoints=http://{{kurtosis:4c53bff1f9bf497297064e619364a8a4:ip_address.runtime_value}}:8551", "--jwt-secrets=/data/data/jwt/jwtsecret", "--suggested-fee-recipient=0x878705ba3f8Bc32FCf7F4CAa1A35E72AF65CF766", "--subscribe-all-subnets", "--metrics", "--metrics-address=0.0.0.0", "--metrics-allow-origin=*", "--metrics-port=5054", "--boot-nodes={{kurtosis:436de81082be48ef82fd313a31ee8b09:extract.enr.runtime_value}},{{kurtosis:80e2828c03dc4c3db8fe2508b685bda0:extract.enr.runtime_value}},{{kurtosis:e9526959d5a2452d80bf2c91e652c2e8:extract.enr.runtime_value}}", "--trusted-peers={{kurtosis:436de81082be48ef82fd313a31ee8b09:extract.peer_id.runtime_value}},{{kurtosis:80e2828c03dc4c3db8fe2508b685bda0:extract.peer_id.runtime_value}},{{kurtosis:e9526959d5a2452d80bf2c91e652c2e8:extract.peer_id.runtime_value}}"], env_vars={"RUST_BACKTRACE": "full"}, private_ip_address_placeholder="KURTOSIS_IP_ADDR_PLACEHOLDER", max_cpu=1000, min_cpu=50, max_memory=1024, min_memory=256, ready_conditions=ReadyCondition(recipe=GetHttpRequestRecipe(port_id="http", endpoint="/eth/v1/node/health"), field="code", assertion="IN", target_value=[200, 206], timeout="15m"))). The instruction can be found at github.com/kurtosis-tech/ethereum-package/src/cl/lighthouse/lighthouse_launcher.star[149:38]
        Caused by: service 'cl-04-lighthouse-geth' requires '256' megabytes of memory but based on our calculation we will only have '174' megabytes available at the time we start the service

This has happened a few times to myself and other users!

Reference:

Desired behaviour

Kurtosis warns the user in the terminal (and local EM UI) when the Docker engine is approaching the limit of its available disk space.

Something like:

Docker is nearly out of disk space, which may cause deployments to fail! (95% of capacity)

Implementation wise, I would love for this to be a one-off check each time I run a package at validation/interpretation time (before execution).

How important is this to you?

Nice to have; this feature would make using Kurtosis more enjoyable.

What area of the product does this pertain to?

CLI: the Command Line Interface

leeederek commented 11 months ago

On Nov 2, we talked to Nethermind team & they told us that they have experienced this issue multiple times and have built a guard rail around this in their wrapper. Sounds like people are silently hit with Disk Usage issues and attribute that to Kurtosis instability.