llvm / clangir

A new (MLIR based) high-level IR for clang.
https://clangir.org
Other
386 stars 98 forks source link

Can we have a scoped representation for cleanups? #1123

Open smeenai opened 1 week ago

smeenai commented 1 week ago

CIR currently attaches cleanup blocks to calls. For example, from https://godbolt.org/z/f4se8MPG7, the following code:

struct S { ~S(); };
void f();
void g() {
    S s;
    f();
    f();
}

produces IR with a separate cleanup block for each call:

cir.try synthetic cleanup {
  cir.call exception @_Z1fv() : () -> () cleanup {
    cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
    cir.yield
  }
  cir.yield
} catch [#cir.unwind {
  cir.resume
}]
cir.try synthetic cleanup {
  cir.call exception @_Z1fv() : () -> () cleanup {
    cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
    cir.yield
  }
  cir.yield
} catch [#cir.unwind {
  cir.resume
}]

This leads to duplicate landing pads (#1052). From what I understand though, cleanups should always form a stack (and I believe Clang uses a stack to store them as well), and I'm wondering if we can take advantage of that and use scopes to represent cleanups instead. The example above would become something like (I'm omitting the catch clause here just for simplicity, but it would also be nice if we could omit no-op catch clauses in the actual IR as well):

cir.try synthetic cleanup {
  cir.call exception @_Z1fv()
  cir.call exception @_Z1fv()
  cir.yield
} cleanup {
  cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
  cir.yield
}

It's more interesting when the cleanups actually stack. CIR currently fails on that (https://godbolt.org/z/Gz7rETEbY), but an example would be:

struct S { ~S(); };
struct T { ~T(); };
void f();
void g() {
    S s;
    f();
    T t;
    f();
}

I'm envisioning it being represented using something like the following:

cir.try synthetic cleanup {
  cir.call exception @_Z1fv()
  cir.try synthetic cleanup {
    cir.call exception @_Z1fv()
    cir.yield
  } cleanup {
    cir.call @_ZN1TD1Ev(%1): (!cir.ptr<!ty_T>) -> ()
    cir.yield
  }
  cir.yield
} cleanup {
  cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
  cir.yield
}

The idea is that each cir.try should correspond to a single call site (as in an entry in the call site table in the LSDA) and landing pad. ٓA cir.yield inside a cleanup cascades to an outer cleanup if there's any or else resumes unwinding. This would translate pretty well to the LLVM generated by Clang (https://godbolt.org/z/Tnd7r9vKo).

The interaction with catch blocks is another interesting design consideration. For example (this also fails with CIR today):

struct S { ~S(); };
struct T { ~T(); };
void f();
void g() {
    try {
        S s;
        f();
        T t;
        f();
    } catch (int) {
        f();
    }
}

I'm picturing as (the catch syntax is just for simplicity, not something I'm actually suggesting):

cir.try cleanup {
  cir.call exception @_Z1fv()
  cir.try synthetic cleanup {
    cir.call exception @_Z1fv()
    cir.yield
  } cleanup {
    cir.call @_ZN1TD1Ev(%1): (!cir.ptr<!ty_T>) -> ()
    cir.yield
  }
  cir.yield
} cleanup {
  cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
  cir.yield
} catch (int) {
  cir.call exception @_Z1fv()
  cir.yield
}

This is the same concept as before: each try corresponds to a single call site and landing pad, and cleanups cascade as before. The landing pad for the inner try in LLVM needs to also catch the int, but it delegates the actual catching work to the outer landing pad (see https://godbolt.org/z/Y1Gxr3McG). We could either explicit add a catch region to that try to indicate this or just have it be implicit based on nesting. Note that this generalizes across more nesting too, e.g. in https://godbolt.org/z/5xW9jxWor, the inner landing pad catches both int and bool, but it delegates to the outer landing pad for int.

Another interesting case to consider is the ordering of cleanups and catches. Consider:

struct S { ~S(); };
void f();
void g() {
    S s1;
    try {
        S s2;
        f();
    } catch (int) {
        f();
    }
}

s2 needs to be destroyed before the catch logic, whereas s1 needs to be destroyed by the landing pad only if the catch isn't entered. https://godbolt.org/z/fnzTGPvr8 shows how Clang handles this (including the additional complication of also destroying s1 if the call to f inside the catch throws). I can think of two ways to represent this. If we want to keep our property of each try corresponding to a single site and landing pad, we can have the ordering of cleanup and catch regions determine the order in which they run, i.e.

cir.try synthetic cleanup {
  cir.call exception @_Z1fv()
  cir.yield
} cleanup {
  cir.call @_ZN1SD1Ev(%s2): (!cir.ptr<!ty_T>) -> ()
  cir.yield
} catch (int) {
  cir.try synthetic cleanup {
    cir.call exception @_Z1fv()
    cir.yield
  } cleanup {
    cir.yield
  }
} cleanup {
  cir.call @_ZN1SD1Ev(%s1) : (!cir.ptr<!ty_S>) -> ()
  cir.yield
}

A cir.yield inside a cleanup would delegate to any subsequent cleanups on the outer try. Alternatively, we could abandon the one try = one call site and landing pad property, and instead have each try correspond to either a cleanup region or a catch region and rely on nesting to convey the correct sequence:

cir.try synthetic cleanup {
  cir.try {
    cir.try synthetic cleanup {
      cir.call exception @_Z1fv()
      cir.yield
    } cleanup {
      cir.call @_ZN1SD1Ev(%s2): (!cir.ptr<!ty_T>) -> ()
      cir.yield
    }
  } catch (int) {
    cir.try synthetic cleanup {
      cir.call exception @_Z1fv()
      cir.yield
    } cleanup {
      cir.yield
    }
  }
} cleanup {
  cir.call @_ZN1SD1Ev(%s1): (!cir.ptr<!ty_T>) -> ()
  cir.yield
}

I can see pros and cons for both. Either way though the general point is that cleanups become scoped instead of attached to individual calls, which should let us generate better code for them (including potentially outlining cleanup sequences so they can be shared by the normal and exceptional paths in the future for code size purposes). @bcardosolopes, what do you think are the pros and cons of this compared to the current representation?

bcardosolopes commented 1 week ago

This leads to duplicate landing pads (https://github.com/llvm/clangir/issues/1052).

I haven't provided more context there yet, but my POV is that the cleanups are really part of a call that throws. In the lifetime checker I'd like to be able to evaluate in one go all side-effects a call could have, for example.

In the LLVM OG cases I've seen so far, sometimes the landing pad isn't shared and the cleanups are threaded through via branches, which we don't and can't represent at this level (they are not always nice stacks, there are few examples in the tests). The approach I'm considering, and I should have probably written that already, is to tag the cleanups that are the same with a special operation cir.cleanup_tag <id_attr> (because we know that accurately during CIRGen) and just de-dup them during CFG flatten.

Either way though the general point is that cleanups become scoped instead of attached to individual calls, which should let us generate better code for them

Generating better code should happen at lowering with a combo of two things: (1) de-deduping cleanups and (2) merging landing pads when possible, it's really tricky to do that on-the-fly during CIRGen if we want to keep structured control flow.

I'm a bit skeptical that this nice structural control flow you are suggesting is going to work, that doesn't match the experience I've had so far, but I could totally be missing something. Can you look at the existing tests to double check they would cover your existing proposals? I haven't had the time to analyze in depth what you just suggested, I'll do that sometime next week.

My suggestion: I'll fix the remaining issues in https://github.com/llvm/clangir/issues/1052, and the ones you just added in this PR, this will allow me to double check if the current approach is sound with the new examples, and also refresh my head to look closely to your proposals - we can discuss the outcome of that and brainstorm on the future direction. WDYT?

bcardosolopes commented 1 week ago

btw, thanks for the comprehensive writeup!

smeenai commented 1 week ago

My suggestion: I'll fix the remaining issues in #1052, and the ones you just added in this PR, this will allow me to double check if the current approach is sound with the new examples, and also refresh my head to look closely to your proposals - we can discuss the outcome of that and brainstorm on the future direction. WDYT?

Sounds good to me. I'll also examine the test cases and see if this is feasible or not (in particular possible interactions with goto could complicate things, as usual).

smeenai commented 1 week ago

I've spent a few days trying to understand how cleanups work in CodeGen and CIRGen today (which has given me a much deeper appreciation of the work that's gone into both). I'm gonna summarize my understanding below, so that we can correct any potential misunderstandings right away instead of proceeding on potentially flawed assumptions.

Whenever an expression that requires a cleanup (e.g. construction of a variable with a non-trivial destructor) is encountered, both CodeGen and CIRGen push an EHCleanupScope onto the EHStack. These scopes work lazily; when a call is encountered within the scope, getInvokeDest is called, which calls into emitLandingPad, which generates the landing pad setup (the catch and cleanup clauses). The actual cleanup code, for both regular and exceptional cleanups, is emitted by PopCleanupBlock, which is called when exiting a scope (or finishing a function).

Right now, CIRGen associates a cir.try synthetic with each call inside a cleanup scope (which also necessitates some landing pad patching). My suggestion is to associate the cir.try synthetic with the scope instead (using the standard EH dispatch block caching mechanism), and have all calls inside the scope share this try. I'm using the simplified version of conditional-cleanup.cpp in https://godbolt.org/z/rWx8Wzdc7, though this should generalize to the full version too:

typedef __SIZE_TYPE__ size_t;

// Declare the reserved global placement new.
void *operator new(size_t, void*);

namespace test7 {
  struct A { A(); ~A(); };
  struct B {
    static void *operator new(size_t size) noexcept;
    B(const A&, B*);
    ~B();
  };

  B *test() {
    return new B(A(), nullptr);
  }
}

Right now, the CIR looks like:

Current CIR ``` cir.func @_ZN5test74testEv() -> !cir.ptr extra(#fn_attr1) { %0 = cir.alloca !cir.ptr, !cir.ptr>, ["__retval"] {alignment = 8 : i64} cir.scope { %2 = cir.alloca !cir.bool, !cir.ptr, ["cleanup.cond"] {alignment = 1 : i64} %3 = cir.alloca !ty_test73A3AA, !cir.ptr, ["ref.tmp0"] {alignment = 1 : i64} %4 = cir.alloca !cir.bool, !cir.ptr, ["cleanup.cond"] {alignment = 1 : i64} %5 = cir.const #false %6 = cir.const #true %7 = cir.const #false %8 = cir.const #true %9 = cir.const #cir.int<1> : !u64i %10 = cir.call @_ZN5test71BnwEm(%9) : (!u64i) -> !cir.ptr extra(#fn_attr) %11 = cir.const #cir.ptr : !cir.ptr %12 = cir.cmp(ne, %10, %11) : !cir.ptr, !cir.bool %13 = cir.cast(bitcast, %10 : !cir.ptr), !cir.ptr cir.store align(1) %7, %2 : !cir.bool, !cir.ptr cir.store align(1) %5, %4 : !cir.bool, !cir.ptr cir.if %12 { cir.store %8, %2 : !cir.bool, !cir.ptr cir.try synthetic cleanup { cir.call exception @_ZN5test71AC1Ev(%3) : (!cir.ptr) -> () cleanup { %17 = cir.load %2 : !cir.ptr, !cir.bool cir.if %17 { cir.call @_ZdlPvm(%10, %9) : (!cir.ptr, !u64i) -> () extra(#fn_attr) } cir.yield } cir.yield } catch [#cir.unwind { cir.resume }] cir.store %6, %4 : !cir.bool, !cir.ptr %15 = cir.const #cir.ptr : !cir.ptr cir.try synthetic cleanup { cir.call exception @_ZN5test71BC1ERKNS_1AEPS0_(%13, %3, %15) : (!cir.ptr, !cir.ptr, !cir.ptr) -> () cleanup { %17 = cir.load %4 : !cir.ptr, !cir.bool cir.if %17 { cir.call @_ZN5test71AD1Ev(%3) : (!cir.ptr) -> () extra(#fn_attr) } %18 = cir.load %2 : !cir.ptr, !cir.bool cir.if %18 { cir.call @_ZdlPvm(%10, %9) : (!cir.ptr, !u64i) -> () extra(#fn_attr) } cir.yield } cir.yield } catch [#cir.unwind { cir.resume }] %16 = cir.const #false cir.store %16, %2 : !cir.bool, !cir.ptr } cir.store %13, %0 : !cir.ptr, !cir.ptr> %14 = cir.load %4 : !cir.ptr, !cir.bool cir.if %14 { cir.call @_ZN5test71AD1Ev(%3) : (!cir.ptr) -> () extra(#fn_attr) } } %1 = cir.load %0 : !cir.ptr>, !cir.ptr cir.return %1 : !cir.ptr } ```

We have two cleanup scopes, one for B::operator new and one for A(). With the tries associated with the scopes instead of the calls, this would look like:

  cir.func @_ZN5test74testEv() -> !cir.ptr<!ty_test73A3AB> extra(#fn_attr1) {
    %0 = cir.alloca !cir.ptr<!ty_test73A3AB>, !cir.ptr<!cir.ptr<!ty_test73A3AB>>, ["__retval"] {alignment = 8 : i64}
    cir.scope {
      %2 = cir.alloca !cir.bool, !cir.ptr<!cir.bool>, ["cleanup.cond"] {alignment = 1 : i64}
      %3 = cir.alloca !ty_test73A3AA, !cir.ptr<!ty_test73A3AA>, ["ref.tmp0"] {alignment = 1 : i64}
      %9 = cir.const #cir.int<1> : !u64i
      %10 = cir.call @_ZN5test71BnwEm(%9) : (!u64i) -> !cir.ptr<!void> extra(#fn_attr)
      %11 = cir.const #cir.ptr<null> : !cir.ptr<!void>
      %12 = cir.cmp(ne, %10, %11) : !cir.ptr<!void>, !cir.bool
      %13 = cir.cast(bitcast, %10 : !cir.ptr<!void>), !cir.ptr<!ty_test73A3AB>
      cir.if %12 {
        cir.try synthetic {
          cir.call exception @_ZN5test71AC1Ev(%3) : (!cir.ptr<!ty_test73A3AA>) -> () cleanup
          %15 = cir.const #cir.ptr<null> : !cir.ptr<!ty_test73A3AB>
          cir.try synthetic {
            cir.call exception @_ZN5test71BC1ERKNS_1AEPS0_(%13, %3, %15) : (!cir.ptr<!ty_test73A3AB>, !cir.ptr<!ty_test73A3AA>, !cir.ptr<!ty_test73A3AB>) -> () cleanup
            cir.yield
          } cleanup {
            cir.call @_ZN5test71AD1Ev(%3) : (!cir.ptr<!ty_test73A3AA>) -> () extra(#fn_attr)
            cir.yield
          }
          cir.call @_ZN5test71AD1Ev(%3) : (!cir.ptr<!ty_test73A3AA>) -> () extra(#fn_attr)
          cir.yield
        } cleanup {
          cir.call @_ZdlPvm(%10, %9) : (!cir.ptr<!void>, !u64i) -> () extra(#fn_attr)
          cir.yield
        }
      }
      cir.store %13, %0 : !cir.ptr<!ty_test73A3AB>, !cir.ptr<!cir.ptr<!ty_test73A3AB>>
    }
    %1 = cir.load %0 : !cir.ptr<!cir.ptr<!ty_test73A3AB>>, !cir.ptr<!ty_test73A3AB>
    cir.return %1 : !cir.ptr<!ty_test73A3AB>
  }

This scoping of cleanups eliminates all the conditional cleanup checks, and I think we should be able to arrange for it, since both CodeGen and CIRGen enter the delete cleanup after the conditional. The current CIR code shouldn't need the conditional cleanups either, since it's also nested inside the if. (I might be missing something but I don't really understand why CodeGen needs it either, and optimizations do eliminate it.)

I'm not sure if we'd be able to emit the cir.try synthetic lazily (when we encounter a call which requires it) with this scheme, because of code like the following:

void f(bool b) {
  has_destructor d;
  if (b)
    g();
  g();
}

Here, when we encounter the first call which necessitates the try, we've already started the if, so for lazy emission, we'd have to associate an insertion point with each cleanup scope, insert the try there, and then shift all subsequent code inside the try, and I have no idea if that's feasible. I don't think emitting the try eagerly (i.e. we just emit the try whenever we enter a cleanup scope) is a huge deal though; it should be easy to clean up unnecessary ones (a try is unneeded if there's no potentially throwing calls inside its body).

Nested cleanups are also interesting. For code like the following (which CIR currently errors on, so I can't compare directly):

struct S { ~S(); };
void f();
void g() {
    S s1;
    S s2;
    f();
}

My idea would be to initially emit the following:

  cir.func @_Z1gv() extra(#fn_attr1) {
    %0 = cir.alloca !ty_S, !cir.ptr<!ty_S>, ["s1"] {alignment = 1 : i64}
    %1 = cir.alloca !ty_S, !cir.ptr<!ty_S>, ["s2"] {alignment = 1 : i64}
    cir.try synthetic {
      cir.try synthetic {
        cir.call exception @_Z1fv() : () -> () cleanup
        cir.yield
      } cleanup {
        cir.call @_ZN1SD1Ev(%1) : (!cir.ptr<!ty_S>) -> () extra(#fn_attr)
        cir.yield
      }
      cir.call @_ZN1SD1Ev(%1) : (!cir.ptr<!ty_S>) -> () extra(#fn_attr)
      cir.yield
    } cleanup {
      cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> () extra(#fn_attr)
      cir.yield
    }
    cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> () extra(#fn_attr)
    cir.return
  }

We'd ideally want to simplify this to a single landing pad, the same way CodeGen is able to do with simplifyCleanupEntry. I think the logic we'd want is roughly "if a cir.try contains nothing except another cir.try and its cleanups, the two can be folded", but the "and its cleanups" part complicates things.

A more radical idea would be for the try block to have a region for normal cleanups as well as exceptional cleanups, and perhaps a shared region for both if they're the same. This could potentially simplify our implementation of branch-afters. For example, for the following code (which doesn't currently compile with exceptions, and generates incorrect code without exceptions, which I'll file a separate bug for):

struct S { ~S(); };
bool f();
void g() {
    for (;;) {
        S s;
        if (f())
            break;
    }
}

A potential translation would be (with finally being pseudo-syntax for a cleanup block executed for both the normal and EH edges):

  cir.func @_Z1gv() extra(#fn_attr) {
    %0 = cir.alloca !ty_S, !cir.ptr<!ty_S>, ["s"] {alignment = 1 : i64}
    cir.scope {
      cir.for : cond {
        %0 = cir.const #true
        cir.condition(%0)
      } body {
        cir.try {
          %0 = cir.call @_Z1fv() : () -> !cir.bool
          cir.if %0 {
            cir.break
          }
          cir.yield
        } finally {
          cir.call @_ZN1SD1Ev(%0) : (!cir.ptr<!ty_S>) -> ()
          cir.yield
        }
      } step {
        cir.yield
      }
    }
    cir.return
  }

The idea would be that the cleanup is executed by both the cir.break and the cir.yield before they continue onto their regular destinations. This is a very early thought though, and I'm not sure how well it would work in practice yet (and in particular how it'd interact with -fno-exceptions).

bcardosolopes commented 7 hours ago

Overall seems like your approach is simplifiying the output CIR significantly, great!

We'd ideally want to simplify this to a single landing pad, the same way CodeGen is able to do with simplifyCleanupEntry.

Yes, we need that information from CIRGen, my take was to tag them (like I mentioned before) but if we can capture the same information as part of CIR, feels like a good situation to be.

I think the logic we'd want is roughly "if a cir.try contains nothing except another cir.try and its cleanups, the two can be folded", but the "and its cleanups" part complicates things.

Perhaps we'll need to tag them even with your approach? Still feels to me CIRGen is the only place that has the reliable info if cleanup blocks are shareable (a CIR only pass would have to do matching of the blocks somehow, which could be expensive).

The idea would be that the cleanup is executed by both the cir.break and the cir.yield before they continue onto their regular destinations. This is a very early thought though

It's a neat idea!