Open jcsp opened 3 years ago
Thoughts on mitigation:
Noah:
I agree we should behave better. We can also be significant better OS citizens. We keep an fd open for every segment and on restarts everything gets rolled so it can really add up quickly in different situations. I’d really like to support lazy open for older segments. Even for a scan the world we’d only need one or two fd per client.
Pasting more stack for search-ability:
ERROR 2021-10-17 08:02:25,033 [shard 1] assert - Assert failure: (../../../src/v/storage/log_reader.h:141) '!_iterator.reader' log reader destroyed with live reader
ERROR 2021-10-17 08:02:25,033 [shard 7] storage - Error writing record batch std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/first-topic/13_11/0-1-v1.log]) ERROR 2021-10-17 08:02:25,033 [shard 5] storage - Error writing record batch std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/first-topic/123_11/0-1-v1.log])
INFO 2021-10-17 08:02:25,033 [shard 2] storage - disk_log_appender.cc:102 - Could not append batch: std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/first-topic/56_11/0-1-v1.base_index]) - {offset_idx:{0}, active_segment:no, _bytes_left_in_segment:0, _base_offset:{0}, _last_offset:{-9223372036854775808}, _last_term:{1}, _byte_size:0}
WARN 2021-10-17 08:02:25,033 [shard 7] raft - [group_id:14, {kafka/first-topic/13}] consensus.cc:1591 - Error occurred while appending log entries - std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/first-topic/13_11/0-1-v1.log])
INFO 2021-10-17 08:02:25,033 [shard 0] storage - disk_log_appender.cc:102 - Could not append batch: std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/first-topic/102_11/0-1-v1.base_index]) - {offset_idx:{0}, active_segment:no, _bytes_left_in_segment:0, _base_offset:{0}, _last_offset:{-9223372036854775808}, _last_term:{1}, _byte_size:0}
ERROR 2021-10-17 08:02:25,033 [shard 1] assert - Backtrace below:
0x3231464 0x29ecf51 0x2a08aa8 0x2b6eedd 0x2fab1b4 0x2fae677 0x2ffe4bd 0x2f5c19f /opt/redpanda/lib/libpthread.so.0+0x9298 /opt/redpanda/lib/libc.so.6+0x1006a2
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, storage::internal::rebuild_compaction_index(model::record_batch_reader, std::__1::__fs::filesystem::path, storage::compaction_config)::$_22, seastar::future<void> seastar::future<storage::compacted_index_writer>::then_impl_nrvo<storage::internal::rebuild_compaction_index(model::record_batch_reader, std::__1::__fs::filesystem::path, storage::compaction_config)::$_22, seastar::future<void> >(storage::internal::rebuild_compaction_index(model::record_batch_reader, std::__1::__fs::filesystem::path, storage::compaction_config)::$_22&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, storage::internal::rebuild_compaction_index(model::record_batch_reader, std::__1::__fs::filesystem::path, storage::compaction_config)::$_22&, seastar::future_state<storage::compacted_index_writer>&&), storage::compacted_index_writer>
INFO 2021-10-17 08:02:25,034 [shard 6] storage - segment.cc:628 - Creating new segment /var/lib/redpanda/data/kafka/first-topic/28_11/0-1-v1.log
I'll grab this for now since I'm looking at #2166
segment index reader file description for all non-head-of-log segments could probably be reclaimed in lru order based on some limit. for example, we could probably query nofile limit on bootup to seed this value.
@VadimPlh ran into this doing some testing with 100 topics of 3000 partitions each. First started seeing RPC timeouts:
[shard 3] rpc - transport.cc:236 - Request timeout, correlation id: <id>
and
[group_id:32604, {kafka/random_test_3/520}] vote_stm.cc:262 - unable to replicate configuration as a leader, stepping down
then assert failure, as expected:
2021-12-17T16:07:21.502228977Z stderr F ERROR 2021-12-17 16:07:21,501 [shard 7] assert - Assert failure: (../../../src/v/cluster/persisted_stm.cc:272) 'false' Can't load snapshot from '"/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot"'. Got error: std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot])
@VadimPlh ran into this doing some testing with 100 topics of 3000 partitions each. First started seeing RPC timeouts:
[shard 3] rpc - transport.cc:236 - Request timeout, correlation id: <id> and [group_id:32604, {kafka/random_test_3/520}] vote_stm.cc:262 - unable to replicate configuration as a leader, stepping down
then assert failure, as expected:
2021-12-17T16:07:21.502228977Z stderr F ERROR 2021-12-17 16:07:21,501 [shard 7] assert - Assert failure: (../../../src/v/cluster/persisted_stm.cc:272) 'false' Can't load snapshot from '"/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot"'. Got error: std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot])
was nofile limit set high enough?
@dotnwat I should have mentioned that; the likely cause was nofile limit being too low. Just adding log messages for documentation.
@VadimPlh ran into this doing some testing with 100 topics of 3000 partitions each. First started seeing RPC timeouts:
[shard 3] rpc - transport.cc:236 - Request timeout, correlation id: <id> and [group_id:32604, {kafka/random_test_3/520}] vote_stm.cc:262 - unable to replicate configuration as a leader, stepping down
then assert failure, as expected:
2021-12-17T16:07:21.502228977Z stderr F ERROR 2021-12-17 16:07:21,501 [shard 7] assert - Assert failure: (../../../src/v/cluster/persisted_stm.cc:272) 'false' Can't load snapshot from '"/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot"'. Got error: std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/random_test_85/617_537/tx.snapshot])
was nofile limit set high enough?
The main issue (in my case) is I got it in my BYOC cluster in our cloud. So I am not sure can I change it. btw, we have thread for that in cloud channel
Internal ticket for the cloud config is https://app.shortcut.com/vectorized/story/6531/byoc-nodes-have-insufficient-file-handle-limit-for-redpanda-s-max-partition-count
Some of the work to remove the 3x files open per segment is being done in #4022 and that specific issue is the subject of #4057.
Like all programs, Redpanda is subject to system limits on how many file handles we can have.
Currently we handle this as an IO error, and log an error something like
ERROR 2021-09-15 14:30:25,567 [shard 0] storage - Error writing record batch std::__1::__fs::filesystem::filesystem_error (error system:24, filesystem error: open failed: Too many open files [/var/lib/redpanda/data/kafka/topic-etnm/12_18/1-16-v1.log])
What the user sees is undefined -- they're likely to see that certain topics become unresponsive as their log segments roll over and they can't open new ones. It's certainly not going to be externally obvious what is going on until the user searches for log errors.
JIRA Link: CORE-734