Borrows in generic code can violate uniqueness constraints

yorickpeterse commented 4 months ago

Please describe the bug

When using generic code, it's possible to borrow a generic type parameter that ends up being assigned a uni T value, which then allows the borrow to outlive the unique value and violate the uniqueness constraints.

Please list the exact steps necessary to reproduce the bug

Consider this code:

class Example[T] {
  let @value: Option[T]
  let @borrow: Option[ref T]

  fn mut set(value: T) {
    @borrow = Option.Some(ref value)
    @value = Option.Some(value)
  }
}

class async Main {
  fn async main {
    let ex = Example(value: Option.None, borrow: Option.None)

    ex.set(recover [10, 20])

    let x = ex.value := Option.None
  }
}

Here ex ends up being of type Example[uni Array[Int]], with the borrow field storing a ref Array[Int], violating the constraints. We can then exploit that using ex.value := Option.None, giving us an Option[uni Array[Int]] while the Example value still stores a borrow to the uni Array[Int].

I'm not sure yet what the solution here is. If we introduce something similar to the mut constraint for type parameters to allow unique values, then we basically end up not being able to use them anywhere or we won't be able to use borrows. Basically it ends up being a massive pain.

Operating system

Fedora 40

Inko version

main

Rust version

1.77.2

yorickpeterse commented 4 months ago

In theory something like this would work: when type-checking a method body, we set some sort of "borrowed" flag for type parameters whenever they're borrowed, either directly or by passing it to an argument of another method that borrows it. When we then assign a uni T to such a type parameter, we disallow this at the call site.

The problem with this approach is that it requires at least two passes over all method bodies: one to set the flag for direct borrows (e.g. through ref x expressions), and one to set the flag for indirect borrows. I'm also not sure this analysis would always be complete.

yorickpeterse commented 3 months ago

Here's another way this can be triggered: if you have some uni T and do that_value.some_field.iter, the returned Iter produces values of type ref Something, allowing you to violate the uniqueness constraints by storing those ref values somewhere.

yorickpeterse commented 3 months ago

Another similar case: we allow pattern matching against e.g. Option[uni ref Something], which then lets you introduce a borrow by binding the uni ref Something to a match variable:

match something {
  case Some(v) -> {
   move_unique_value_to_another_process(unique_value)
   do_something_with_the_uni_ref_we_still_have(v)
  }
  ...
}

To prevent this from happening we have to disallow binding uni ref T and uni mut T values to match variables, similar to how you can't assign them to locals.

yorickpeterse commented 1 month ago

To see how many places there are where we explicitly use ref x / mut x with x being a type parameter, I modified the compiler as follows:

diff --git a/compiler/src/type_check/expressions.rs b/compiler/src/type_check/expressions.rs
index 6573bb59..31567c34 100644
--- a/compiler/src/type_check/expressions.rs
+++ b/compiler/src/type_check/expressions.rs
@@ -3069,6 +3069,16 @@ impl<'a> CheckMethodBody<'a> {
     ) -> TypeRef {
         let expr = self.expression(&mut node.value, scope);

+        // TODO: remove
+        if expr.is_type_parameter(self.db()) {
+            self.state.diagnostics.warn(
+                DiagnosticId::InvalidType,
+                "ref with type parameter that might violate uni T",
+                self.file(),
+                node.location.clone(),
+            );
+        }
+
         if !expr.allow_as_ref(self.db()) {
             self.state.diagnostics.error(
                 DiagnosticId::InvalidType,
@@ -3112,6 +3122,16 @@ impl<'a> CheckMethodBody<'a> {

         let expr = self.expression(&mut node.value, scope);

+        // TODO: remove
+        if expr.is_type_parameter(self.db()) {
+            self.state.diagnostics.warn(
+                DiagnosticId::InvalidType,
+                "mut with type parameter that might violate uni T",
+                self.file(),
+                node.location.clone(),
+            );
+        }
+
         if !expr.allow_as_mut(self.db()) {
             self.state.diagnostics.error(
                 DiagnosticId::InvalidType,

Running the test suite then produces the following warnings:

src/std/array.inko:135:16 warning(invalid-type): ref with type parameter that might violate uni T
src/std/array.inko:635:15 warning(invalid-type): ref with type parameter that might violate uni T
src/std/array.inko:726:15 warning(invalid-type): mut with type parameter that might violate uni T
src/std/iter.inko:275:20 warning(invalid-type): ref with type parameter that might violate uni T

I was expected a lot more warnings, so perhaps requiring the equivalent of T: ref to allow borrowing might not be that bad after all.

A challenge here is when we deal with closures and captures. Take this code for example:

import std.stdio (STDOUT)
import std.string (ToString)

class Person {
  let @name: String
}

impl ToString for Person {
  fn pub to_string -> String {
    @name
  }
}

fn example[T: ToString](value: T) {
  fn {
    let out = STDOUT.new

    out.print(value.to_string)
    nil
  }
    .call
}

class async Main {
  fn async main {
    let alice = recover Person(name: 'Alice')

    example(alice)
  }
}

Here the closure captures value by borrowing, violating the uniqueness constraints of alice in the process. Requiring T: ref just to be able to capture it is likely going to make working with closures and generics an absolute pain in the ass, as pretty much every such case will need T: ref and through that forbid the use of uni T values.

yorickpeterse commented 1 month ago

Pony's approach to this is to introduce a variety of reference capabilities and ways to convert between them. However, the resulting complexity of this makes the language difficult to use, which is likely why it never really caught on.

yorickpeterse commented 1 month ago

We could potentially remove the need for explicit T: mut, T: move, and T: ref through inference. The rules would (roughly) be as follows:

Given a generic type T:

If it's passed to a ref, used as the value for a ref expression
If it's passed to a mut, used as the value for a mut expression, is the receiver of an fn mut method, or is captured by an fn closure, it's inferred as T: mut
If it's the receiver of an fn move, it's inferred as T: move

Closures here complicate things: since they may mutate the captured data we can't infer T: ref and instead must infer T: mut, which may be annoying. The actual inference in turn would require multiple passes over all methods, as inferring the rules for method B may affect its caller A.

yorickpeterse commented 1 month ago

If we go down the path of inference, we can avoid using multiple passes using an approach like this:

We maintain a map of Type Parameter => [Type Parameters] where the key is a type parameter used in an argument, and the values the type parameters passed to that argument. When inferring rules for a type parameter that's a key, we propagate those to the type parameters in the value array. So given code like this:

fn foo[A](a: A) {
  bar(a)
}

fn bar[B](b: B) {
  baz(b)
}

fn baz[C](c: C) {
  mut c
}

The mapping would be:

C => [B]
B => [A]

When processing mut c we note we're operating on C and flag it as C: mut. We then look it up in the map and find [B]. We then iterate over those values, flag them accordingly, and repeat this process until we run out of work.

The downside is that this won't work reliable when dealing with type parameters inside generic types, such as this:

fn foo[A](a: A) {
  bar(a)
}

fn bar[B](b: B) {
  baz([b])
}

fn baz[C](c: Array[C]) {
  mut c.pop.get
}

Here the issue is that we don't pass B directly to c. This means we'd have to traverse the type graph and build the mapping as we go (e.g. as part of type checking).

The other approach is to process methods multiple times and record method dependencies. That is, when we encounter a call to a method we've yet to process, we store the caller as a dependency of the callee (e.g. bar => [foo] in the above example). When we finish processing, we check for any methods that depend on it and reschedule them, removing them from the dependency list in the process. This requires some form of caching such that we can skip work in a rescheduled method (e.g. foo) that we've already performed.

yorickpeterse commented 1 month ago

I read through Functional Ownership through Fractional Uniqueness hoping it would perhaps provide some extra insight, but alas it felt like a paper with a lot of words but not a lot of actually useful content, though perhaps I just don't grok it.

yorickpeterse commented 1 month ago

Also read through Flexible recovery of uniqueness and immutability, but it basically just describes the Pony model using different terms and some extra fluff.

inko-lang / inko