segeljakt commented 1 year ago

Review Mojo's priorities

[X] I have read the roadmap and priorities and I believe this request falls within the priorities.

What is your request?

Hi, I'm coming from Rust and am very excited about Mojo. Since Mojo is currently under design, I would like to bring up some pain points in Rust's ownership/borrowing system which could be relevant to Mojo.

Problem 1. Swapping mutable borrows

A limitation of the ownership and borrowing model in Rust is that it's not possible to swap the data of two mutable references. For example, in Rust, this code does not compile:

fn swap<T>(x: &mut T, y: &mut T) {
    let temp = *x;
    *x = *y; // Compiler Error
    *y = *temp;
}

Rust has to go out of the ownership model using unsafe and swap raw pointers (std::mem::swap):

fn swap<T>(x: &mut T, y: &mut T) {
    unsafe {
        let a = ptr::read(x);
        let b = ptr::read(y);
        ptr::write(x, b);
        ptr::write(y, a);
     }
}

This problem can occur in more places, when you for example are implementing a linked data structure and need to rearrange links between nodes.

Problem 2. Borrowing uninitialized data

Another problem in Rust is that you cannot mutably borrow uninitialized data, which could be useful if you want to initialize something at a later point in the program. For example:

fn init(x: &mut Vec<i32>) -> &mut Vec<i32> {
    *x = vec![1, 2, 3];
    x
}

fn main() {
    let mut x;
    let mut x = init(&mut x); // Compiler Error
}

Rust needs to go out of its borrowing model using MaybeUninit to address this problem:

fn init(x: *mut Vec<i32>) -> *mut Vec<i32> {
    unsafe {
        x.write(vec![1, 2, 3]);
    }
}
fn main() {
    let mut x = MaybeUninit::uninit();
    let mut x = init(x.as_mut_ptr());
    let mut x = unsafe { x.assume_init() };
}

Proposed Solution

Some people in the Rust community (particularly @danielrab) have proposed extensions to the existing ownership/borrowing model to deal with this problem.

In this model, references to objects have four types of capabilities:

Move - right to move data out of an object
Read - right to read data from an object
Write - right to write data to an object
Drop - right to drop the reference at the end-of-scope

References have three types of ownership:

Shared references: Can be copied.
Unique references: Cannot be copied (only one copy may exist).
Owned references: A unique reference which calls the destructor for its data when the reference is dropped.

There are five types of references:

& (Read-Drop). Shared read-only reference.
&mut (Move-Read-Write-Drop). Unique reference. Moving out of it downgrades it to &out.
&out (Write). Unique write-only reference. You must write to it before it is dropped, which causes it to be upgraded to &mut.
&own - (Move-Read-Write-Drop). Owned reference. Moving out of it downgrades it to &empty.
&empty (Write-Drop) - Owned reference. Can be written to, upgrading it to &own.

The two programs before could be written as:

fn swap<T>(x: &mut T, y: &mut T) {
    let temp = *x; // x is converted to &out when moving out of it
    *x = *y;
    *y = *temp;
}

fn init(x: &empty Vec<i32>) -> &out Vec<i32> {
    *x = vec![1, 2, 3]; // *x is converted to 
}

fn main() {
    let mut x;
    let mut x = init(&empty x);
}

I think the current design of Mojo's ownership/borrowing does not address these issues yet (I can't access the playground right now so correct me if I'm wrong 🙁).

I think the solution could potentially be adapted to Mojo by adding the mentioned references/parameters (own, out, empty). It introduces more flexibility to the ownership/borrowing model at the cost of complexity. Mojo already gives more control over how data is moved, so perhaps there are other solutions to this problem, but I think it could be interesting to discuss nonetheless.

What is your motivation for this change?

Add additional flexibility to ownership/borrowing over Rust to allow swapping references and borrowing uninitialized data.

Any other details?

No response

oskgo commented 1 year ago

In Rust both these problems have library solutions (std::mem::swap and std::mem::MaybeUninit::write), which seem to me like they would work fine in Mojo as well.

Is your objection is that the library solutions use unsafe code internally? I'm not sure if avoiding unsafe in a tiny and finite set of functions would be worth the additional complexity. Making new language features for every individual use of unsafe code doesn't scale very well, and in my view isn't much safer either.

A better motivation here would be a pattern that requires unsafe code but cannot be abstracted away safely.

segeljakt commented 1 year ago

I think unsafe is fine from a safety-perspective as long as you can prove that the code is indeed safe. The bigger problem in my opinion is that the Rust compiler views unsafe as a blackbox, which can prevent optimisations from happening. Extending the ownership model could allow the compiler to better reason about pointer swapping and uninitialized memory.

oskgo commented 1 year ago

The Rust compiler does not view unsafe as a blackbox. You should be able to wrap all your existing code in unsafe blocks without changing its behavior. This is even more obvious in Swift, where unsafe is just a naming convention, and it seems like Mojo will be going that route.

Maybe what you mean is that raw pointers tend to prevent optimizations, but I don't think this is true either, and for such a statement we probably want some Mojo specific evidence.

segeljakt commented 1 year ago

When I wrote unsafe, I did not mean the blocks, but the actual unsafe operations within them. Unsafe blocks are only meant to make unsafe operations explicit and show that the programmer is aware of them being used (and has hopefully verified them to be sound). Sorry for the confusion

lattner commented 1 year ago

Hi @segeljakt thank you for writing this up. I'm sorry for the delay, i've been a bit overwhelmed recently and am just getting caught back up on the lifetime work.

Re: your example, I'm not a rust super-expert, why doesn't problem 1 work? The rust compiler should know that x and y are distinct unaliased pointers? In any case, the equivalent works in Mojo and should continue to work with the lifetimes proposal:

struct S:
  fn __copyinit__(inout self, existing: Self):
    pass

fn my_swap(inout a: S, inout b: S):
    let tmp = a
    a = b
    b = tmp

On your second point, this is effectively how the Mojo compiler works internally, and we fudge a couple of things for sake of simplicity of model. For example, the 'self' member of a __del__ destructor is a reference, but it is "magic" in that it is required to be live-in and uninit-out. The self for a memory-only __init__ has the opposite polarity, being uninit on entry and init on exit.

We don't currently expose these level of detail up to the user type system, but we could if there were a compelling reason to. Let's keep this issue open and discuss more as the first rounds of the lifetimes proposal comes together. I believe this should be trivial to do for us, but there is a question about how powerful to make the model at the expense of making it more confusing.

danielrab commented 1 year ago

the reason problem 1 doesn't work is because we are attempting to move out of a, remember that in Rust the assignment is moving by default. The equivalent in Mojo would (I think) be

fn my_swap(inout a: S, inout b: S):
    let tmp = a^
    a = b^
    b = tmp^

sorry if I did it wrong, I don't really know the Mojo syntax.

lattner commented 1 year ago

Weird, ok. That does work in Mojo, because it fully tracks lifetime holes for mutable references like that. I assumed that Rust supported generalized lifetime holes, but maybe it is only in certain cases.

segeljakt commented 1 year ago

Hi, sorry for my late response. For the first part, I made a small mistake in the swap function, it should be:

fn swap<T>(x: &mut T, y: &mut T) {
    let temp = *x;
    *x = *y;
    *y = temp;
}

The error you get in Rust is:

error[E0507]: cannot move out of `*x` which is behind a mutable reference
 --> src/main.rs:2:16
  |
2 |     let temp = *x;
  |                ^^ move occurs because `*x` has type `T`, which does not implement the `Copy` trait

error[E0507]: cannot move out of `*y` which is behind a mutable reference
 --> src/main.rs:3:10
  |
3 |     *x = *y;
  |          ^^ move occurs because `*y` has type `T`, which does not implement the `Copy` trait

I tried this Mojo code and it compiles successfully:

struct S:
  fn __moveinit__(inout self, owned existing: Self):
    pass

fn my_swap(inout a: S, inout b: S):
    let tmp = a^
    a = b^
    b = tmp^

It seems good but I'm not sure how this also compiles:

struct S:
    fn __moveinit__(inout self, owned existing: Self):
        pass

fn my_swap(inout a: S, inout b: S):
    let tmp = a^
    a = b^
    b = tmp^
    a = tmp^

Is there something going on under the hood here?

On your second point, this is effectively how the Mojo compiler works internally, and we fudge a couple of things for sake of simplicity of model. For example, the 'self' member of a __del__ destructor is a reference, but it is "magic" in that it is required to be live-in and uninit-out. The self for a memory-only __init__ has the opposite polarity, being uninit on entry and init on exit. We don't currently expose these level of detail up to the user type system, but we could if there were a compelling reason to. Let's keep this issue open and discuss more as the first rounds of the lifetimes proposal comes together. I believe this should be trivial to do for us, but there is a question about how powerful to make the model at the expense of making it more confusing.

Hmm I see, limiting special references to certain builtin methods without exposing their complexity sounds nice and makes it slightly more powerful than Rust.

lattner commented 1 year ago

Is there something going on under the hood here?

Actually, that is a bug in the Mojo lifetime checker that I fixed last weekend. We now produce an error:

$ mojo test.🔥
test.🔥:16:12: error: use of uninitialized value 'tmp'
    a = tmp^
           ^
test.🔥:13:5: note: 'tmp' declared here
    let tmp = a^
    ^

This should be fixed in the next build that gets published, sorry for the confusion!

danielrab commented 1 year ago

the difficult part, which I don't know if mojo deals with, is allowing

struct S:
  fn __moveinit__(inout self, owned existing: Self):
    pass

fn my_swap(inout a: S, inout b: S):
    let tmp = a^
    a = b^
    b = tmp^

while disallowing

struct S:
  fn __moveinit__(inout self, owned existing: Self):
    pass

fn my_move(inout a: S, inout b: S):
    b = a^

the problem with the latter one is that b was moved out of, and so doesn't contain a valid value anymore, which inout shouldn't allow

lattner commented 1 year ago

FWIW, Mojo already handles that correctly. The second one is diagnosed correctly with this error message:

test.🔥:17:1: error: 'a' is uninitialized at the implicit return from this function
fn my_move(inout a: S, inout b: S):
^

And my_swap is accepted of course.

modularml / mojo

[Feature Request] Ownership/Borrowing extensions #372

Review Mojo's priorities

What is your request?

Problem 1. Swapping mutable borrows

Problem 2. Borrowing uninitialized data

Proposed Solution

What is your motivation for this change?

Any other details?