CI on GitHub-hosted dockerized platforms sometimes fail due to "No space left on device" error

kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.

https://kuzudb.com/

MIT License

1.38k stars 97 forks source link

CI on GitHub-hosted dockerized platforms sometimes fail due to "No space left on device" error #2196

Closed russell-liu closed 12 months ago

russell-liu commented 1 year ago

Examples where error is shown in log: https://github.com/kuzudb/kuzu/actions/runs/6489330683/job/17627731842#step:17:566 https://github.com/kuzudb/kuzu/actions/runs/6489330683/job/17627731490#step:17:352 Sometimes the error is only shown in summary: https://github.com/kuzudb/kuzu/actions/runs/6489330683

Riolku commented 1 year ago

I'd wager that we create too many intermediate objects (along with arrow) that causes us to use too much space. Why it would be inconsistent is a mystery to me.

mewim commented 1 year ago

I did some test locally and I think rust debug build is taking the most space and it seems to be expected (https://users.rust-lang.org/t/why-are-debug-symbols-so-huge/81117, https://users.rust-lang.org/t/running-out-of-disk-space/10641/3). I think if we run the rust build in release mode, we will be able to fix this.

Riolku commented 1 year ago

Yeah, rust uses a lot of space. Why were we ever building in debug mode?

mewim commented 1 year ago

I have no idea. I think @benjaminwinger set this up on self-hosted CI and @russell-liu simply copied the scripts to GitHub-hosted CI.

mewim commented 1 year ago

The inconsistency seems be due to that GitHub-hosted runners may be dispatched to different types of Azure VMs: https://github.com/actions/runner-images/issues/2840#issuecomment-791177163

Sometimes, image can have ~15-20 GB of free space and this volume can be changed without notice. We only guarantee that images contain at least 14 GB of free space.

benjaminwinger commented 1 year ago

Yeah, rust uses a lot of space. Why were we ever building in debug mode?

Debug mode is often faster for rust. It also has sometimes ended up catching some debug-only issues like assertions which otherwise get ignored, though I think what we need to do there is rewrite the ASAN job to fail on any test failure, rather than just when there is output in the asan log.

Riolku commented 1 year ago

Does ASAN run in debug mode? IIRC, no

Riolku commented 1 year ago

Why is debug mode often faster? Faster to build or run?

benjaminwinger commented 1 year ago

Faster to build. Release builds are usually much slower by comparison for rust than they are for C++. Though I'm not sure what the difference would be like for kuzu off-hand.

Evidently you changed it recently, but prior to 9ebc7f1cf6624f288a the asan job was in debug mode. We need something in CI that's testing in debug mode; otherwise we should remove the asserts or turn them into runtime exceptions which occur in release mode too.

Riolku commented 1 year ago

Ah. I didn't realize that my changes changed the behaviour of asan. This explains a lot.

I don't really think assertions impose much of a performance penalty, and would welcome a change to make them always on. We already have KU_ASSERT.

Riolku commented 1 year ago

After a bunch more discussion, my plan is to make rust build in release mode and enable asserts in CI by using a FORCE_ASSERT environment variable. To do this, every assert will be replaced by KU_ASSERT, which will be enabled if !defined(NDEBUG) || defined(FORCE_ASSERT), so, it will be enabled if we force it on, or we are in debug mode.