hylo-lang / hylo

The Hylo programming language
https://www.hylo-lang.org
Apache License 2.0
1.22k stars 56 forks source link

Soundness bug in inout accessor #1599

Open seanbaxter opened 1 week ago

seanbaxter commented 1 week ago
public type Foo : Deinitializable {

  public memberwise init

  public subscript(_ arg: let Int): Int {
    inout {
      yield arg; // Why is this allowed?
    }
  }
}

public fun main() {
  var a = Foo()

  let i = 101;

  // Gain a mutable reference to i.
  inout d = &a[i]

  // Read through i:
  print(i)

  // Mutate through d:
  &d = 102;

  // Exclusivity violation:
  // we can name i while there's a live mutable reference (d) to it.
  print(i)
}

Is this an exclusivity violation? I can load from i while modifying it through an alias d. If I make the subscript parameter inout and call the subscript like &a[&i] the compiler gives the "illegal mutable access" error on the first print statement, as expected.

Can I ask what the lifetime semantics are on the reference returned through the subscript ramp function? What are the constraints on the caller and callee that trigger "illegal mutable access" and "illegal immutable access" errors? Since you're returning a reference from a function, I would think, as with Rust, it's putting constraints on the arguments of the function call. I'm guessing it's constraining the lifetime on the return reference to the same lifetime that constrains all the argument references.

dabrahams commented 1 week ago

Is this an exclusivity violation?

Yes, that's a bug. You shouldn't be able to project a let-bound parameter as inout.

Incidentally, you don't need Foo; you can declare freestanding named subscripts like this one.

kyouko-taiga commented 1 week ago

Thanks for the issue. That is an interesting use case.

I'm guessing it's constraining the lifetime on the return reference to the same lifetime that constrains all the argument references.

You're guessing right. The constraints placed on the arguments of a subscript at call site:

The bug that you've demonstrated comes from the yield arg in the subscript implementation, which shouldn't be allowed. I believe the access checking in main did its job.

seanbaxter commented 6 days ago

Why are subscripts/projections implemented with coroutines rather than with normal functions?

The "Borrow checking Hylo" paper explains the "inversion of control" for a function that passes a reference to internal state to the lambda through an inout/let parameter. There's no live analysis needed there--the lifetime of the parameter outlives the function call, and exclusivity enforcement at the point of the call is enough to prevent invalidation of parameters inside the call. That's cool. But then the goes on:

Hylo offers syntactic support for defining access through inversion of control... The yielded value of a subscript--unlike a function return value--is logically bound to its sources, with lifetime and independence guarantees upheld just as with local let or inout bindings.

The coroutine doesn't do inversion of control. For a coroutine with one yield, it's split into a ramp function that returns the yielded value and a continuation function that resumes after the suspend point. Lifetime safety is upheld by constraints that relate the ramp's returned reference/projected value to the subscript arguments, as you confirmed. It's the same system as for a reference-returning normal function. The coroutine doesn't provide logical lifetime safety like the lambda-style inversion of control does.

If you simplify it down, this model uses borrow checking for reference-returning functions, but where the lifetime constraints are fixed function (according to the constraint rules you bulleted), rather than programmable with lifetime parameterizations as in Rust. Turning subscripts into normal functions massively simplifies codegen and maintains separation between the caller and callee. With the coroutine approach, the caller has to know the subscript's definition in order to allocate its coroutine frame on the stack. Why use coroutines for returning references instead of much simpler normal functions?

kyouko-taiga commented 6 days ago

Functions returning references don't give us as much power as subscripts. In particular, a function cannot be used to project ephemeral values or, more generally, guarantee that some logic will run after the projection ends.

For example, we can't write this subscript with a function that returns references:

subscript bit(_ p: Int, of n: inout Int): Bool {
  let { yield (n & (1 << p)) != 0 }
  inout {
    var b = (n & (1 << p)) != 0
    yield &b
    &n = if b { n | (1 << p) } else { n & ~(1 << p) }
  }
}

public fun main() {
  var bits = 2
  inout b = bit[1, of: &bits]
  &b = false
  print(bits) // 0
}

But I'd also like to point out the fact that whether or not a subscript is implemented as a co-routine is mostly irrelevant to the user. The important part is that a subscript implements inversion of control. We claim that this model is easier to use than one with references because fundamentally Ref<T> is not T, thereby requiring extra steps to reason about the former's operations (e.g. dereferencing).

If the compiler can figure out how to compile it as a "normal function", then it will. If that must be guaranteed in a particular case (e.g., for performance) we can decorate the subscript declaration.

dabrahams commented 6 days ago

I think what @kyouko-taiga didn't say is that there are cases with overlapping but non-nested lifetimes that we haven't figured out the details of how to compile as ordinary functions. That's the reason we currently have to use yield-once coroutines, IIRC. That may actually result in more efficient code, but we haven't done the tests. If it does, that would be a reason to use the coroutine mechanism even if we don't have to.

seanbaxter commented 6 days ago

Overlapping but non-nested lifetimes are handled by the constraint solver. The coroutine frame is extraneous. If the coroutine actually effected an inversion of control you wouldn't need the borrow checker. Your lifetime safety strategy for preventing UAF on the yielded reference from the coroutine ramp is the same as it would be for preventing UAF on the returned reference from a normal function.

@kyouko-taiga why do you say the subscript implements inversion of control? The client calls a ramp function and gets a reference and continues execution in its own stack frame, just like it would with a normal function. By contrast, the lambda callback is an inversion of control.

I'm arguing this point because this lifetime safety model is tantalizingly close to one that's easy to describe and implement in C++: Rust-style borrow checking but with fixed-function lifetime parameterizations. The model as it is currently described involves two exotic parts: inversion of control (which it doesn't actually do) and coroutines (which are irrelevant to lifetime safety, hard to reason about, and very flaky with respect to codegen #1474). It's those two unnecessary pieces that prevented me from understanding how this model of lifetime safety actually works.

dabrahams commented 6 days ago

Coroutines are only a convenient implementation technique and are not part of the semantic model. You can do the whole thing with the lambda style, and ultimately we're going to generate the code whichever way is most efficient. It is also wrong to conclude, from Hylo's open issue, that LLVM's coroutine primitives are flaky with respect to codegen. Swift uses coroutines to do its codegen for subscripts with no issues, so it's certainly possible to make them reliable.

Overlapping but non-nested lifetimes are handled by the constraint solver

They are only typechecked by the constraint solver. They make figuring out how to generate the right lambda more complicated and we haven't got code for that yet.

If you think what a subscript does is just like what a normal function that returns a reference does, you haven't understood it yet.

I know you resist the idea of watching my talk to understand this stuff, but if you'll take just 5 minutes and watch starting here and ending before slide 109, it will explain to you why we say it's inversion of control. The last part of those 5 minutes also shows how inversion of control allows us to project ephemeral values, which cannot be expressed with a function that returns a reference.

seanbaxter commented 6 days ago

I know you resist the idea of watching my talk to understand this stuff

Really arrogant stuff Dave.

dabrahams commented 6 days ago

I don't know why you are saying that. I drew that conclusion from your posts on Reddit (like this one), and if I've misunderstood those I apologize. Certainly no malice was intended.

FWIW, we're doing our best to be helpful. From our end, the impression is that you've been antagonistic and have been making incorrect definitive pronouncements about Hylo without actually having understood it. Maybe we can all take this as an opportunity to try to reset the relationship.

seanbaxter commented 6 days ago

Ok. Let's reset. Adding these comments after a soundness hole report should show I'm taking the work seriously and am thorough in my research.

I implemented inout and let bindings like you have. My goal is to try memory safety without lifetime parameters. But I had to change the inout/let bindings to use T% safe references (i.e. borrows without lifetime arguments), for a number of reasons. For one, they need to act like pointers, not like lvalues, in order to accommodate the const_cast for interior mutability. Things break down when you pretend they're values rather than references.

I'm starting to allow use of safe references in structs, because I'm in need of types with reference semantics in order to use things like RefCell and Mutex. Doing this without lifetime parameters is proving difficult. The Rustnomicon is the only guide for solving shared ownership problems with compile-time safety checks, so naturally this second spin at memory safety becomes ever more Rust-like. I've been hammering on something that started close to Hylo, but have been changing to get support for necessary things like shared mutable state.

The subscript design is the most conspicuous difference and it's hard to wave away. You saw how the _modify accessor addressed a runtime lifetime safety problem and adopted that for a compile-time lifetime safety problem. I think that creates the illusion of a big difference between "borrow checking without lifetime parameters" and the Hylo model.

Why was _modify introduced? Swift uses COW to enforce exclusivity and that has performance hazards. If you have dictionary of arrays, and you want to append into an element array, using the get accessor on the dictionary returns a shallow copy of the element array: it increments the refcount on the array. Even if you have exclusive access to the outer type, the refcount on the inner type is 2. Mutation on the element creates a copy. The setter on the container replaces the old element array with the freshly-allocated one. Due to all the copy-on-write, there's accidental quadratic complexity when doing linear-time operations. _modify sets direct mutation on the requested element and yields that element pointer. The continuation restores COW semantics.

The _modify accessor is used for runtime lifetime safety. But with static analysis there's nothing to restore in the continuation. In Swift this mechanism is used rarely--only 20 _modify accessors in the 180kloc stdlib. Why make the coroutine central to the model when it doesn't figure in to solving the compile-time safety problem? The natural simplification is just to use normal functions that return references.

You mention the video--I've watched the video. "Returned references are a very complex abstraction to have in terms of language complexity." And you're paying for that abstraction with borrow checking. I know because I probed the compiler and read the LLVM and IR lowerings.

I only came here to establish some facts:

  1. There is no inversion of control in the subscripts. It's equivalent to normal functions that return references.
  2. Borrow checking enforces safety, not lexical containment as with inout/let function parameters (the original mutable value semantics design).
  3. The constraints depend on the function parameters of the subscript and are upheld until the last use of the returned reference.
  4. Coroutines can be used to return ephemeral values (although you can use pass-by-value get/set properties for those as well), but they're a convenience and not a safety strategy.

Can the safety model be characterized as "borrow checking without lifetime parameters?" At least mechanically, if not philosophically?

dabrahams commented 6 days ago

I think maybe there are too many separate topics here for this medium but I will try to respond. Maybe we should think about separating topics into Discussions.

Yes, it was clear to me from your filed issues that you're taking the work seriously, thanks.

Regarding safe references in structs, there's a feature not discussed in the talk called “remote parts.” In the terms you're thinking, they boil down to stored references, but with the restriction is that the thing storing the reference can't escape. Ability to escape is tied to Movable conformance in Hylo although strictly speaking a value doesn't always need to move in order to escape. Currently Hylo has only implemented remote parts for lambda captures.

As far as shared mutable state is concerned, I'm pretty sure the only way to make it safe is by using dynamic mechanisms to ensure the law of exclusivity is upheld, meaning that you can project an inout or let out of the shared state but then you trap or throw or suspend the thread if you try to create multiple projections out of the state where one of them is an inout. All the Rust library components like RefCell do that, and the same techniques would apply in Hylo. You would implement these safe abstractions using unsafe components. There's no need to compromise the safe subset of the language to support this.

Because you can have multiple extant let bindings to a given value, interior mutability creates hidden shared mutable state, and is not compatible with this model unless you add those kinds of dynamic mechanisms around it. In C++ maybe it's not hidden from the compiler as it would often be in Swift or Hylo, but I think you still end up with a thread safety problem.

_modify was introduced for reasons explained in my talk. Swift started with get and set, but you lose context between get and set so you end up having to do bounds checking (Array) or much worse, hashing (Dictionary) twice. But maybe you aren't actually asking me why _modify was introduced; it's not 100% clear from the way you open that paragraph, but yes, if you use getters and setters to do mutation in an array of arrays, you get terrible performance; that's an even better example of the kind of performance problem _modify solves, but it's more complicated to describe so I didn't use it in the talk. There are many such examples.

Other reasons for _modify include that Swift wanted to support non-copyable types one day, and you can't access those with get and set. Further reasons may be revealed by this pitch to remove the underscore and make them part of the official language.

Swift does not use COW to enforce exclusivity. Exclusivity enforcement for structs (like Array) and enums is done entirely by Swift's equivalent of a borrow checker, and for classes and closures (which have reference semantics) there's a dynamic check that isn't strong enough to prevent data races (and then there's a long ugly story about how Swift does thread safety that I'm not going to get into here). COW is not even built into the core language; it is used in all of the standard library's variable-sized data structures because it's the only way to get value semantics in a language without copy constructors, where everything is implicitly copyable. The phrase "COW semantics" is meaningless to me so I don't know what you're saying in that part. The way I think of COW, it doesn't have semantics; it's just a technique for achieving value semantics and optimizing copies.

I don't think it's really meaningful in a safe-by-default language to say “the _modify accessor is used for runtime lifetime safety.” Safety is a foundational requirement, and all features outside the unsafe subset must uphold it. Swift was safe before the _modify accessor came along. It was introduced to improve performance, and upholding safety was a design constraint.

I'm not sure what you mean by “with static analysis there's nothing to restore in the continuation,” and I'm going to refuse the temptation to guess. Please clarify.

Why make the coroutine central to the model when it doesn't figure in to solving the compile-time safety problem?

If by "the coroutine" you mean the inversion of control and the ability to put code after the yield (coroutine is just an implementation technique)… Again, Swift didn't have a compile-time safety problem. For Swift, _modify solved a performance problem while maintaining all the expressivity of get/set. To do that you need to be able to execute code after the yield. I think I may have found the thread that introduced the idea here. Worth a read.

The natural simplification is just to use normal functions that return references.

Maybe but you couldn't implement Swift's Dictionary API that way. Subscripting a Dictionary with a key yields an optional value, but there are no Optionals stored in the dictionary. Maybe you don't care about that, but try to remember that Swift's goals are probably not the same as yours. Maybe you don't care about the things one gets from this feature. We happen to care.

Regarding your list of facts,

  1. simply isn't a fact. I don't know why you keep asserting that it's equivalent to returning references, but it is not: you can express things that can't be expressed by returning references. You may not care about expressing those things, but that doesn't make them non-existent.
  2. I don't understand what you mean. Maybe you're saying that strict lexical containment isn't required for safety and our subscripts, which are borrow-checked, can be viewed as not creating lexical containment?
  3. , if I'm understanding you correctly, would be right if there were a returned reference. But there isn't one, because 1. is false.
  4. coroutines are just an implementation technique. We can use lambdas just as well. We don't have safety strategies, since we start from a safe-by-default baseline. Inversion of control is an expressivity benefit over returning references and bounding their lifetimes to the lifetimes of function arguments.

You could characterize the mechanics of inout and let bindings as "checked borrows without lifetime parameters." That's not just for safety, and some safety needs to be upheld by dynamic checks (bounds checking on Arrays, to say nothing of your shared mutable state examples) so I wouldn't call it “the safety model.” but key parts of our statically-enforced safety story fall out of it (not even all of the story; just enforcing well-typedness is a key element). Since subscripts can be rewritten in terms of inout and let bindings (via the lambda model), you can say they are a part of the same picture.

dabrahams commented 5 days ago

I realized this morning why you probably care about inversion of control, even if you don't want to project ephemeral values. If you want to do the Mutex wrapper thing, it's valuable to return to the accessor when the access is finished so you can unlock the Mutex.

dabrahams commented 3 days ago

@seanbaxter please LMK if all of that is clear. Thanks.