Closed russell-liu closed 12 months ago
I'd wager that we create too many intermediate objects (along with arrow) that causes us to use too much space. Why it would be inconsistent is a mystery to me.
I did some test locally and I think rust debug build is taking the most space and it seems to be expected (https://users.rust-lang.org/t/why-are-debug-symbols-so-huge/81117, https://users.rust-lang.org/t/running-out-of-disk-space/10641/3). I think if we run the rust build in release mode, we will be able to fix this.
Yeah, rust uses a lot of space. Why were we ever building in debug mode?
I have no idea. I think @benjaminwinger set this up on self-hosted CI and @russell-liu simply copied the scripts to GitHub-hosted CI.
The inconsistency seems be due to that GitHub-hosted runners may be dispatched to different types of Azure VMs: https://github.com/actions/runner-images/issues/2840#issuecomment-791177163
Sometimes, image can have ~15-20 GB of free space and this volume can be changed without notice. We only guarantee that images contain at least 14 GB of free space.
Yeah, rust uses a lot of space. Why were we ever building in debug mode?
Debug mode is often faster for rust. It also has sometimes ended up catching some debug-only issues like assertions which otherwise get ignored, though I think what we need to do there is rewrite the ASAN job to fail on any test failure, rather than just when there is output in the asan log.
Does ASAN run in debug mode? IIRC, no
Why is debug mode often faster? Faster to build or run?
Faster to build. Release builds are usually much slower by comparison for rust than they are for C++. Though I'm not sure what the difference would be like for kuzu off-hand.
Evidently you changed it recently, but prior to 9ebc7f1cf6624f288a the asan job was in debug mode. We need something in CI that's testing in debug mode; otherwise we should remove the asserts or turn them into runtime exceptions which occur in release mode too.
Ah. I didn't realize that my changes changed the behaviour of asan. This explains a lot.
I don't really think assertions impose much of a performance penalty, and would welcome a change to make them always on. We already have KU_ASSERT
.
After a bunch more discussion, my plan is to make rust build in release mode and enable asserts in CI by using a FORCE_ASSERT
environment variable. To do this, every assert will be replaced by KU_ASSERT, which will be enabled if !defined(NDEBUG) || defined(FORCE_ASSERT)
, so, it will be enabled if we force it on, or we are in debug mode.
Examples where error is shown in log: https://github.com/kuzudb/kuzu/actions/runs/6489330683/job/17627731842#step:17:566 https://github.com/kuzudb/kuzu/actions/runs/6489330683/job/17627731490#step:17:352 Sometimes the error is only shown in summary: https://github.com/kuzudb/kuzu/actions/runs/6489330683