std::system_error (error system:24, could not create inotify instance: Too many open files)
It turns out that Seastar's TLS implementation internally uses inotify to automatically recognize when the certificate files have changed. The error message is misleading - for inotify_init() an EMFILE errno (which gets printed as "Too many open files") does not refer the limit on number of open files, but to the separate per-user limit on the number of inotify instances. This number is configured in /proc/sys/fs/inotify/max_user_instances and is often fairly low - e.g., on my Fedora 40 it defaults to just 128. It appears that Seastar creates an inotify instance for each shard, and the ScyllaDB test framework ran many tests in parallel, and the result was running out of inotify instances.
The ScyllaDB testers easily solved this problem by increasing the /proc/sys/fs/inotify/max_user_instances, but I'm worried that this problem can hit other Seastar users as well, who won't be aware that Seastar TLS even uses inotify, and that /proc/sys/fs/inotify/max_user_instances is so low. I want to propose that we consider two options, perhaps even both:
Use fewer inotify instances - perhaps even just once per per process instead of one per shard (e.g., have just shard 0 be responsible for the very rare inotify work). On modern many-shard machines, this will make a big difference.
Be forgiving towards failing inotify_init() in TLS: Make it a logged warning (don't use the phrase "Too many open files" in the logged message) - not an exception. When inotify_init() failed, either the certificate reloading would not work at all - or if we consider it an important feature, we can implement something similar without inotify (e.g., once a minute check if the file changed).
Recently some ScyllaDB test runs failed (https://github.com/scylladb/scylladb/issues/21199) with the error:
It turns out that Seastar's TLS implementation internally uses inotify to automatically recognize when the certificate files have changed. The error message is misleading - for inotify_init() an EMFILE errno (which gets printed as "Too many open files") does not refer the limit on number of open files, but to the separate per-user limit on the number of inotify instances. This number is configured in
/proc/sys/fs/inotify/max_user_instances
and is often fairly low - e.g., on my Fedora 40 it defaults to just 128. It appears that Seastar creates an inotify instance for each shard, and the ScyllaDB test framework ran many tests in parallel, and the result was running out of inotify instances.The ScyllaDB testers easily solved this problem by increasing the
/proc/sys/fs/inotify/max_user_instances
, but I'm worried that this problem can hit other Seastar users as well, who won't be aware that Seastar TLS even uses inotify, and that/proc/sys/fs/inotify/max_user_instances
is so low. I want to propose that we consider two options, perhaps even both:CC @elcallio