Open travisdowns opened 1 year ago
Changed to sev/low since it only happens when debug logging is on.
This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale
label – otherwise this will be closed in two weeks.
Version & Environment
Redpanda version: 23.1.1
What went wrong?
When an out-of-memory situation occurs, we try to log memory diagnostics (see for example https://github.com/redpanda-data/core-internal/issues/99), but this may itself fail if the logging allocates memory and this allocation fails.
The result is that the diagnostics are not output, and also the backtrace is more confusing because it has many extra frames representing the effort to print the diagnostics.
What should have happened instead?
The diagnostics should be printed if at all possible.
See this series which has a fix and test for a similar issue:
https://github.com/scylladb/seastar/commit/0e4139ed6e905c1f347e3f4caf90e227d2795b06
However, the test misses this case because it does not test the outer
maybe_dump_memory_diagnostics
function but the lower leveldo_dump_memory_diagnostics
, which does not include the tracing of the backtrace which is doing the allocation here. Furthermore, the issue only occurs if debug logging is on, since at less verbose logging levels the concerned message is not printed.Additional information
Please attach any relevant logs, backtraces, or metric charts.
Example of a backtrace showing recursive OOM:
JIRA Link: CORE-1212