rust-lang / rust-memory-model

Collecting examples and information to help design a memory model for Rust.
Apache License 2.0
126 stars 15 forks source link

Defining Terms #8

Closed strega-nil closed 7 years ago

strega-nil commented 8 years ago

goto https://github.com/nikomatsakis/rust-memory-model/blob/master/specification/terms.md

arielb1 commented 8 years ago

I don't like the terms "lvalue" and "rvalue" in this context, because they represent expressions - i.e. parts of a program's code, instead of values - parts of a program's data.

Maybe "lvalue expression"/"rvalue expression" would be better here?

arielb1 commented 8 years ago

A "datum" is the inhabitant of a type. Datums are purely mathematical objects and neither live in memory not form part of a program statement. However, datums may be partially undef (do we want that?).

Example datums are true: bool, [0, 0, 0, 0]: [u32; 4], 0xcccccccc: *const u32, the address of the local "foo": &u32. Some((4, 2)): Option<(u32, usize)>.

I would have used "value", but we use "lvalue"/"rvalue" for something different.

strega-nil commented 8 years ago

Now that we've talked on irc, I think we should move away from lvalue and rvalue and towards something else, and value. I'm not sure what to replace lvalue with yet, however.

arielb1 commented 8 years ago

I think location expression/lexpr for the kind of expression.

strega-nil commented 8 years ago

@arielb1 hmm.... I'm going to mull it over. I'll put it into the original as a stopgap for if we can't think of anything better.

arielb1 commented 8 years ago

BTW, how do you call "what an lexpr evaluates to"?

e.g. if you have

unsafe fn xyz(foo: *const u8) -> [u8; 4] {
    *(foo as *const u32) = 1;
    *(foo.offset(1) as *const [u8; 4])
}

The location expressions *(foo as *const u32) and *(foo.offset(1) as *const u32) evaluate to something, which is written to and read from. These are overlapping objects, right?

eternaleye commented 8 years ago

@arielb1: They're certainly overlapping accesses, but C's (and I think also @ubsan's) definitions use "object" to refer to distinct allocations (which definitionally cannot overlap), while colloquial usage is closer to "values of a type".

Honestly, though, if we're calling them lexprs, I'd go with "overlapping locations"

strega-nil commented 8 years ago

@arielb1 Either one or both are undefined behavior, so mu :)

Remember, an object must have at least align_of::<T>()

A better example is the following:

let x: [u8; 3] = [0; 3];
let x0: &[u8; 2] = unsafe { &*((&x[0..1]) as *const [u8] as *const [u8; 2]) };
let x1: &[u8; 2] = unsafe { &*((&x[1..2]) as *const [u8] as *const [u8; 2]) };

*x0 and *x1 are location expressions which refer to overlapping objects, although importantly, they're both part of one bigger object. They are both what are known as subobjects.

arielb1 commented 8 years ago

The memory model "primarily" defines the semantics of reads and writes (it also cares about projections etc).

So

read :: Metadata -> Type -> Object -> Memory -> Either Value UB
write :: Metadata -> Value -> Type -> Object -> Memory -> Either Memory UB

Right (basically)?

strega-nil commented 8 years ago

More like (and in my favorite syntax :3)

enum Object<'a> {
    Super(Memory),
    Sub(&'a Object, Offset, Size),
}
enum Type(...);
struct Value {
    ...
}
type LExpr<'a> = &'a Object<'a>;
struct Metadata {
    ...
}

impl<'a> Object<'a> {
    /** 
        `ptr::read`

        `val.typeof()` shall be equal to `ty`
    */
    pub fn read(&self, ty: Type, meta: Metadata) -> Either<Value, UB> {
        ...
    }
    /**
        `ptr::write`

        `val.typeof() == ty` *must be* true
    */
    pub fn write(&self, ty: Type, val: Value, meta: Metadata) -> Either<(), UB> {
        ...
    }
}
RalfJung commented 8 years ago

I find the terminology around types and "generic types" confusing. Following the usual terminology in the type theory literature, every x such that the statement "variable v has type x" makes sense is a type. So, i32 is a type, and Vec<i32> is a type. There is no fundamental difference in nature between i32 and Vec<i32>. Vec (or Vec<_>, to more clearly indicate the missing argument) , on the other hand, is a "thing" that can be given a type to produce a type - Vec is of "kind" Type -> Type. I cannot tell from §1 whether Vec<i32> or Vec is supposed to be a "generic type" - the former would IMHO be introducing a distinction that's rarely (if at all) necessary, the latter would be confusing because Vec is not a type.

eternaleye commented 8 years ago

@RalfJung

From what I'm seeing in the document, "generic type" is approximately "type constructor".

Unless all parameters of the type constructor have been bound to arguments, a type constructor has unbound parameters, and so is a "generic type" (or a partially applied type constructor, and thus also a type constructor). The document phrases this as "if any of these parameters are generic", which I agree is confusing.

RalfJung commented 8 years ago

So, "generic types" would be higher-kinded things like Vec<_> (specifically, they would not be of kind Type)? I see.

arielb1 commented 8 years ago

@ubsan used "generic type" for Vec<T> - types with free type variables.

strega-nil commented 8 years ago

Right, I couldn't think of a better term, and I didn't think for very long. "Type constructor" sounds a lot better.

arielb1 commented 8 years ago

@ubsan

Type constructor is something else - more like a function that takes "unnamed" type arguments and returns a type (e.g. λT. Vec<Vec<T>> is a type constructor under some definitions but not others).

eternaleye commented 8 years ago

@arielb1: Vec<T> is such a thing - it's a function with signature * -> *. You pass in a type (such as i32), and get out a type (Vec<i32>). The fact that T gives that parameter a name is simply an artifact of Rust not being pointlesspoint-free :stuck_out_tongue:

RalfJung commented 8 years ago

So when you write Vec<T>, you mean λT. Vec<T> (which is what I meant with Vec<_>)? The T you are using is not bound anywhere I can see, but maybe it's part of the context...

type T := i32;
fn foo(v: Vec<T>) {}

Here, Vec<T> clearly is a type (of kind * aka Type). I'm sure that's not what you meant, but IMO it's better to be precise.

@arielb1 I agree with "type constructor" sounding confusing... for me it refers only to primitive higher-kinded terms like Vec<_>, but not to composed terms like Vec<Vec<_>> (similar to how Some is a constructor of type for<T> T -> Option<T>, but not all things of that type are constructors). Unfortunately, I don't have a better name than "higher-kinded term (with codomain *)" :/