rust-lang / polonius

Defines the Rust borrow checker.
Apache License 2.0
1.32k stars 74 forks source link

Attempting to compile `unicode-general-category` #181

Open HollayHorvath opened 2 years ago

HollayHorvath commented 2 years ago

Hey folks,

I have tried to compile the unicode-general-category crate with polonius, but I wasn't able to do so in an acceptable timeframe (I killed the process after 1h, it was using around 40Gb of ram).

I do know that polonius is under development and this issue will not be solved for a while, but since the compile times are so different (2.5s vs at over 1h/unknown) I felt it justifies to file an issue about this.

ecstatic-morse commented 2 years ago

Thanks @HollayHorvath.

unicode-general-category generates a constant initializer (CATEGORY_BLOCKS) which contains a single basic block that initializes many thousands of locals one after another. Our current implementation of move/init analysis generates a single fact for each initialized variable at each CFG node (MIR statement). ~100k nodes * ~10k locals = ~1G facts.

This is a known issue, and I have a solution in mind. However, it will be difficult to be competitive with Rust's implementation due to limitations in the Datalog engines we're using (Soufflé and Datafrog).

ecstatic-morse commented 2 years ago

I described a pure datalog solution to this type of problem in #182, although I'm not sure whether we'll actually go that way. It might be better to integrate rustcs dataflow library into Polonius, perhaps by allowing constraints that are backed by dataflow cursors. That said, I think we might want to explore basic-block aware variants of core borrowck relations (subset, origin_contains_loan_at, etc.), and doing it for move/inits is a good first step.

Note that, in the Naive and Optimized variants, we would still need liveness data for each variable at all points, which has the same problems as initializedness. Luckily, the LocationInsensitive invariant should be able to rule out enough potential borrow errors that it won't be a problem for many examples, including this one. Pathological cases will still exist, though.