ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.74k stars 415 forks source link

Soundness bug / compiler crash: Match bindings should not be consumable unless the discriminee is ephemeral #3596

Open jasoncarr0 opened 4 years ago

jasoncarr0 commented 4 years ago

So in another round of soundness holes, bindings in a match are regular, consumable, assignable variables. This would be sound, but not very useful, if we had to consume before matching, but we (rightfully) allow matching on non-ephemerals.

Bonus: This code example makes ponyc segfault.

class B
class BadMatch
  let b: B iso = B
  fun make_alias(): this->B iso^ =>
    match b
    | let b': this->B iso =>
      consume b'
    end

actor Main
  new create(env: Env) =>
    let badMatch = BadMatch
    let b1 = badMatch.make_alias()
    let b2 = badMatch.make_alias()
    env.out.print((b1 is b2).string())

Expected behavior: b' should not be consumable or re-assignable, unless it matches on something which is ephemeral.

This code below is example desirable code, which relies on the ability to match without aliasing:

class UsesNoConsume[T]
  var value: (T | None) = None
  fun get(): this->T? =>
    match value
    | let none: None => error
    | let t: this->T => t
    end
jasoncarr0 commented 4 years ago

But in any case, finally get to check all of these boxes on the language checklist:

jasoncarr0 commented 4 years ago

On second inspection, this is not enough; the variables included in the match must also not be consumable:

let x: A iso = A
match x with
| let x2: A iso => sendsomewhere(consume val x); x2.foo = bar
end
jemc commented 4 years ago

It seems overly restrictive to prevent any consuming of both x and x2 during the match block, because there are some examples of safe code that does that.

Wouldn't it be a better formulation of the restriction if the part of the compiler that does consume tracking simply treats x and x2 as having a common consume lifetime? That is, the compiler should know that a consume of x2 is effectively a consume of x, and vice versa.

That may or may not be a simple rule to implement in the ponyc compiler, but it seems like a better rule in the sense that in it allows a larger subset of safe programs while closing the safety hole you've highlighted.

It may or may not end up being better to "just" use flow typing in the Pony compiler as we do in the Mare compiler, which would maintain x as literally the same variable throughout (thus naturally having the same consume lifetime) - just with different types in the different scopes where it is used, based on the type it is known to have within that scope. Maybe this change is too big though, and it's better to figure out how to unify the consume tracking for different variables.

jasoncarr0 commented 4 years ago

That seems like it would work, although this area does get rather thorny when talking about more complicated expressions than a single variable (we'd likely have to invalidate quite a number of variables when any unique variables are invalidated). Noting that this only applies when the discriminee is not consumed already. If it is, then the match binding can be consumed as well.

Could you link some of the samples where the consumes you've mentioned are used?

For what it's worth, here's about the most extreme example I can think of for invalidation

let arr: Array[Array[(A iso | B iso)] iso] = ...
match arr(0)? with
| let arr1: Array[(A iso | B iso)] iso =>
   match arr1(0)? with
   | let a: A iso =>
       a.some_method(consume arr)
   | let b: B iso =>
       consume b; arr(0)?.(0)?.another_method()
   end
end
jemc commented 4 years ago

Could you link some of the samples where the consumes you've mentioned are used?

I didn't necessarily have specific samples in mind - I was just speaking hypothetically about how to rethink the restriction to allow the most flexibility while still fixing the hole.

However, I think your example in the above comment demonstrates that my suggested approach wouldn't cover all cases - there is no local variable to tie the consume tracking to.

I think your suggested approach fails too though, since it only considers consumes and not field extraction (and similar)... even if you prevent consumes in the match body, nothing is stopping other kinds of ephemeral extraction. For example:

   match array(array.size() - 1)?
   | let a: A iso =>
     let a2: A iso = array.pop()?
     // a and a2 are now separate iso aliases to the same object; FAIL
   end

I think it's a really misleading "misfeature" of Pony matches that we use let for the type-matched binding, but allow examples like the one above where a non-ephemeral iso or trn can be "bound" to such a let.

That is, we wouldn't allow let a: A iso = arr(0)?, so why do we allow match arr1(0)? | let a: A iso => ...? I think the soundness bugs you've raised here were probably caused because we think of match as if it does a let binding (so the compiler code often treats it the same), but it's really not the same (as demonstrated with let a: A iso = arr(0)? being invalid).

This is only a problem because function returns are allowed to be non-ephemeral, which I originally thought was useless, but you've convinced me that there are cases where it can be valuable (basically, non-extracting getters that let you access nested features without downgrading the field to tag or box).

In Mare I'm probably going to move toward something along the lines of a A'iso'aliased = arr[0]!, where you can bind a local variable that has this "aliased" qualifier to mark it as non-ephemeral, allowing you to match on its type or do whatever else you need. Of course, the type inference system would usually do this for you and just let you say a = arr[0]!. Then you can use flow typing to have a scope where the type is refined as needed, for the case of distinguishing between A and B as in your example.

But I'm sort of at a loss for how to solve this in Pony, without larger changes to the language, the syntax, and how we treat ephemerality...

jemc commented 4 years ago

Yeah, I'm pretty convinced at this point that we won't be able to get around this until we stop treating a match clause as a new let binding...

Unless we disallow the non-ephemeral match altogether. Honestly when you first raised this bug I was surprised that we allowed non-ephemeral match at all. But I was also surprised to see valid use cases for non-ephemeral return, so maybe that fits :smile:

jasoncarr0 commented 4 years ago

I think your suggested approach fails too though, since it only considers consumes and not field extraction (and similar)... even if you prevent consumes in the match body, nothing is stopping other kinds of ephemeral extraction. For example:

Had you not posted the example, I'd have thought this was a minor issue, and that field extraction was implied by consume here. But the example you posted is just a ref method. It can get arbitrarily worse of course, even if we don't use the result, since, so calling any ref method has to immediately invalidate all of these bindings, since it could for instance, take a variable, consume it to val, send it, and return nothing.

I think the only end state in which we could have let bindings like the suggested ends up with lifetime tracking (though it would best be set at certain large subsets that never need lifetime names; such that we don't violate Pony's ability to remain explicit).

I'm okay with matches being the only way to have such temporary bindings, but it could be quite a bit of effort to find an appropriate condition here that doesn't compromise usefulness. Likely there's a simple syntactic criterion in the vein of checking all the cases we've mentioned here: only one consume, no methods/field access on things whose fields we've matched on without consuming.

jasoncarr0 commented 4 years ago

Honestly when you first raised this bug I was surprised that we allowed non-ephemeral match at all.

This too. My first thought was that there was no way to do X because we can't do non-ephemeral match, but then I realized we could

jasoncarr0 commented 2 years ago

While we're at it, a related issue that seems like it should be solved together or held until this one is solved:

iso match bindings can be invalidated by mutation in some case even if not visibly consumed, if the paths alias.

I believe this is mentioned above but calling it out in its own small comment is helpful

SeanTAllen commented 2 years ago

It would be nice if we could enhance this ticket with test cases that could be used for validating this is fixed.

jemc commented 2 years ago

Here's an example of iso invalidation that Jason and I were talking about on the sync call:

class UsesNoConsume[T] // (for the case where T is an iso)
  var value: (T | None) = None
  fun ref get(): T? =>
    match value
    | let none: None => error
    | let t: T =>
      let t2 = value = None
      t2 // is a T iso
      t // is also a T iso
    end
jemc commented 2 years ago

Note that instead of assigning the value = None result to a variable t2, you could also pass it as an argument to a function that does some unknown things to it (where we can't statically analyze it in scope), including the possibility of it being consumed there and converted to a val while you still hold the mutable t: T iso locally - which is obviously a bad thing (type system invariants are invalidated).

I mention that just to prove that static analysis of the scope can't save us here.

If you think the preventing the value = None extraction in the first place with static analysis would be possible, note that this could also happen in some foreign scope that we can't see into - if you pass this as a ref to another function, it would be possible to cause the extraction there where we can't see it.

So yeah, static analysis of what's in the match block doesn't cut it unless we were to impose some really restrictive limitations to what can happen inside a match block (e.g. no method calls, etc)

jemc commented 2 years ago

In the sync call today, I said it's my position that we basically have two options here:

To me, :two: is a reasonable, practical way to close the soundness hole quickly, while :one: is a major (but worthwhile) endeavor to take through the slow process of researching, formalizing, and implementing a new design.

SeanTAllen commented 2 years ago

This is a bug. I talked with Sylvan in part because I was confused about why match let's weren't creating an alias and he agreed it was a bug. It might have been in there from early on, but Sylvan was very surprised by the bug. This was introduced at some point.

We discussed in sync and:

We need to fix this by having the let assignments correctly create an alias and then we can fix the usability issues about (T | None) that it might create.