redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.21k stars 564 forks source link

rpk: aio tuner should take in account /proc/sys/fs/aio-nr when setting /proc/sys/fs/aio-max-nr #4004

Open esteban opened 2 years ago

esteban commented 2 years ago

Version & Environment

Redpanda version: v21.11.9

What went wrong?

In hosts where /proc/sys/fs/aio-nr has been previously tuned via sysctl, using rpk redpanda tune will set the value of /proc/sys/fs/aio-max-nr to 1048576, in most cases it shouldn't be a problem but in hosts where /proc/sys/fs/aio-nr is also configured to the same value or very close to it will cause Redpanda to abort with the following message:

rpk[19642]: ERROR 2022-03-14 02:18:12,717 [shard 47] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application

What should have happened instead?

rpk redpanda tune should take in account the value of /proc/sys/fs/aio-nr and offset /proc/sys/fs/aio-max-nr by the same amount in order to allocate the right number of AIO slots.

JIRA Link: CORE-859

JapuDCret commented 1 year ago

Have a similar issue with the Redpanda container (observed in redpanda:v22.3.10 and redpanda:v22.3.11) in my Testcontainer setup.

When resources are a little scarce, i get

libc++abi: terminating with uncaught exception of type std::runtime_error:
    Could not setup Async I/O: Resource temporarily unavailable.
    The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr.
    Try increasing that number or reducing the amount of logical CPUs available for your application

and

WARN  2023-01-20 13:01:00,316 seastar - Requested AIO slots too large,
please increase request capacity in /proc/sys/fs/aio-max-nr. available:54510 requested:88208

unfortunately, one cannot guarantee that that much resources are available in every Testcontainer execution.

The worst thing about this, is that the Container does not say that it is unhealthy.

fracasula commented 5 months ago

Same here. I'm using the attached docker-compose.yaml file (I got it from the RedPanda quickstart page here). I'm running this on my laptop. My laptop isn't doing anything else, it's running on a 12th gen i9-12900HK (14 cores). Memory wise it has 64GB DDR5.

DEBUG 2024-01-26 16:43:53,804 seastar - smp::count: 20
DEBUG 2024-01-26 16:43:53,804 seastar - latency_goal: 0.00075
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU0 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU1 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU2 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU3 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU4 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU5 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU6 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU7 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU8 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU9 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU10 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU11 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU12 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU13 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU14 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU15 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU16 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU17 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU18 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU19 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Auto-configure 1 IO groups
WARN  2024-01-26 16:43:53,826 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:65536 available:16 requested:220520
Could not initialize seastar: std::runtime_error (Could not setup Async I/O: Not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)

Screenshot from 2024-01-26 17-41-21

Screenshot from 2024-01-26 17-42-34

docker-compose.zip

fracasula commented 5 months ago

In my case I had to increase the threshold by doing:

echo 1048576 > /proc/sys/fs/aio-max-nr

Now all 3 brokers are able to run:

sudo sysctl -a | grep -i aio
fs.aio-max-nr = 1048576
fs.aio-nr = 65520