redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.69k stars 591 forks source link

Avoid some oversized allocs at high partition density #24193

Closed StephanDollberg closed 2 days ago

StephanDollberg commented 4 days ago

Avoid some oversized allocs in both the cluster and storage subsystem at high partition density.

Backports Required

Release Notes

dotnwat commented 3 days ago

It looks like there is a bisectability issue. Reproduce with:

git fetch upstream
gh pr checkout -f 24193
git rebase 496639122f98049c5437fc796125a454743b7b8b

git checkout HEAD~1
bazel build //...

The build error should look something like


INFO: Invocation ID: f12c04a1-b6ab-40fd-bdca-b16a670032f9
INFO: Analyzed 1469 targets (34 packages loaded, 1941 targets configured).
ERROR: /home/nwatkins/src/redpanda-bisect-check/redpanda/src/v/storage/BUILD:271:20: Compiling src/v/storage/log_manager.cc failed: (Exit 1): cc_wrapper.sh failed: error executing CppCompile command (from target //src/v/storage:storage) external/toolchains_llvm~~llvm~llvm_18_toolchain/bin/cc_wrapper.sh -U_FORTIFY_SOURCE '--target=x86_64-unknown-linux-gnu' -U_FORTIFY_SOURCE -fstack-protector -fno-omit-frame-pointer -fcolor-diagnostics ... (remaining 834 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/v/storage/log_manager.cc:197:14: error: no matching function for call to 'async_clear'
  197 |     co_await ssx::async_clear(_logs);
      |              ^~~~~~~~~~~~~~~~
bazel-out/k8-fastbuild/bin/src/v/ssx/_virtual_includes/async_clear/ssx/async-clear.h:36:1: note: candidate template ignored: could not match 'table' against 'flat_hash_map'
   36 | async_clear(chunked_hash_map<Key, Value, Hash, EqualTo>& container) {
      | ^
1 error generated.
StephanDollberg commented 3 days ago

It looks like there is a bisectability issue. Reproduce with:

Yes there is a circular dependency between the first commit and the two following which can't be avoided unless:

Now I do think bisectability is massively overrated:

Human-bisect (aka just looking at the commit logs) is so much faster as often it's fairly obvious in which subsystem something went wrong. Because of that I have never actually git-bisected anything and will always trade piecemeal commits that are easier to review at PR time and in the git history over fully automated git-bisectability.

There is a related issue where bisectability causes pain. Assume you find a bug/something broken in the code or in production. You'll first want to write a test to confirm that you can reproduce the issue and see that it fails. Then you go about fixing it. Git-bisectability forces you commit the test last as otherwise previous commits don't bisect cleanly. This results in some git rebase reordering dance whenever you want to work on the test in its failing form.

But I know people have different opinions on this so happy to merge all commits into one if that is preferred.

dotnwat commented 3 days ago

Yeh I don't think we should twist ourselves into a knot to ensure bisectability, but IME it's usually a simple fix. It may not be simple in this PR, I was just letting you know.

You'd rather want to bisect on merge commits which more properly represent logical pieces of work

Yeh, and I think that Git does try to do this before descending into the merge.

piyushredpanda commented 2 days ago

/backport v24.3.x