keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

Mutable Objects #32

Open keean opened 7 years ago

keean commented 7 years ago

Looking at the mess Rust has created with its handling of mutability, and also the confusion l-values and r-values cause in C/C++ I think a simpler model is needed for mutability. I also think that mutability needs to be part of the type signature not a separate annotation.

I think the problem comes from a failure to distinguish between a value and a location in the type system. A value is something like a number 1, 2, 3, or a particular string like "Hello World". It cannot change, values are by definition immutable. References to values do not make sense, as we should simply pass the value (as it is immutable, there is no concept of identity with values a '1' is a '1' there is no concept of 'this one' or 'that one'). Variables stand for unknown values, therefore are also immutable, like for example in "2x = 6". Once we know 'x' has the value '3' it always stands for '3' in that scope. All values and variables are by nature r-values (and not l-values) so this simplifies things considerably. We write the type of an integer value or variable as "Int" for example.

So what about mutability? Well values and variables are not mutable, how do you change something? You need a container (or a location in computer terms). You can only write to something that has a location (is an l-value in 'C/C++' terms. A container is something like an array, that has slots the contain values, and the values in the slots can be changed. As a parametric type we might write "Array[Int]", Of course an array has length, so we want a simple single value container for the common case of something like an integer we can increment. We could write this as a type: "Mut[Int]". This represents a singleton container that can have an 'Int' value put in it. Containers are 'l-values' they have an address. The important point is that the container itself is a value that can be assigned to a variable, it is just the contents that can change.

In this way the semantics are kept clean, and we don't need concepts like l-values and r-values, and nor do we have the problems Rust has with closures, where it does not know whether to reference or copy an object. So if value assignment is 'let' we can assign a value to a variable that cannot change, or another type of object like a single mutable location.

let y = 3 // value assignment, y is immutable, 3 has type Int, 'y' has type Int
let x = Mut(3) // value assignment x is a mutable container that initially has the value 3 in it. x has type Mut[Int]
x = 2 // the contents of the container called 'x' are changed to '2'. This is syntactic sugar for x.assign(2), which would have a type like: (=) : Mut[X] -> X -> Mut[X].

This removes the need to have any kind of special annotation for mutability or references, removes the need to understand l-values and r-values.

Finally we need references to containers so that the same mutable value can be shared with different program fragments. References can be Read Only, Write Only, or ReadWrite. we can have:

let x = RORef(Mut(3)) // x is a readonly reference to a container that has an int of 3 in it.
let y = WORef(Mut(3)) // y is a writeonly reference to a container that initially contains 3.
let z = RWRef(Mut(3)) // a reference that can be used for both reading and writing.

However the variables themselves are still immutable. A for loop could look like this:

for (let x = Mut(10); x > 0; x--) { ... }

where 'x' would have the type "Mut[Int]".

shelby3 commented 7 years ago

With GC, I don’t see any need to explicitly type references. We simply remember those primitive types which are always copied when assigned.

Agreed on annotating types with read-only, write-only, and read-write. Per my proposal for concurrency, we’ll also need an annotation for exclusive borrowing and another for stored (transitive) borrows.

I am preferring to keep these annotations concise employing symbols:

() for exclusive borrow (enclose any read/write symbols, or , , ) + for write-only - for read-only nothing for read-write ! for stored (transitive) borrow

Note const (non-reassignment) doesn’t apply to the type but rather to the reference instance construction site. I am contemplating that const will be the default (for both function arguments and other instances). For function arguments these will be written normally but for other instances they will be created with := (optionally ) instead of = for reassignment. Reassignable instances will be prepended with var. Obviously for primitive types which are always copied then an instance with writable type must be var. So your example could be written less verbosely:

for (var x = 10; x > 0; x--) { ... }
keean commented 7 years ago

Rust used symbols for different kinds of references and borrows and they have changed to words like 'Ref' etc because it is more readable.

shelby3 commented 7 years ago

Rust used symbols for different kinds of references and borrows and they have changed to words like 'Ref' etc because it is more readable.

I argued above we do not need Ref because (at least for my plans), contrary to Rust I am not attempting to create an improved C or C++ with programmer control over all those low-level details. Also Rust requires symbols such as & at the call/instantiation site and I proposed no symbols at the call/instantiation site other than := (optionally ). Please enumerate all the symbols Rust used/uses and what they’ve been changed to. Specifics are important. Note I am not yet claiming I disagree, because I need to compare the specifics and think about it.

Afaics, Rust is a blunder thus far, so I am not quite sure if we can cite them as a desirable leader on any design decisions. Remember I had complained to them about the need for the noisy & at the call sites.


Edit: also that last link exemplifies some of the complexity that is involved with having these concepts of references types, value types, and boxed types. I’d rather have the compiler/language deal with those things implicitly. For example, the compiler knows when it needs to use a boxed type because more than one type can be stored there. Unboxed slots in data structures are an optimization. Rust required that complexity because it is tracking borrows in a total order, which IMO is an egregious design blunder.

Ah I see a mention that ~T is the old syntax for what is now Box<T>. I agree that Box<T> is more readable than ~T because ~ has no relationship to Box in my mind. However, I think the symbols I proposed may be more mnemonic:

() cordoned border around access + putting more into to something - taking out something ! for not released, although I realized that Rust named these moves.

keean commented 7 years ago

I have been arguing that it is precisely because type systems do not differentiate between values and containers, that language semantics get very opaque and complex, for example C/C++ have l-values and r-values, and rust does not know whether to copy or reference variables in a closure.

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values). It probably also needs to distinguish between boxed and unboxed things.

By distinguishing between objects and references you gain control over ownership, and so you can keep track of whether the caller or callee is responsible for deallocating the memory. As you say, this is probably not necessary with GC, but then I think that a hybrid GC would give the best performance, where you deallocate when things leave scope like C/C++, and only garbage collect those objects that escape their scope.

I think Rust's lifetime annotations are taking things a bit far, but I find the RAII style leads to clean code in C++. There is a clear advantage to having destructors over finalisers.

In any case, it is probably safe to infer things at the value level, as long as all these different kinds of things (that have different semantics) are clearly distinguished at the type level.

shelby3 commented 7 years ago

A prior quote might be apropos regarding “complexity budget”.

I’m attempting to reread most of the discussion on these Issues threads so I can make my final decisions and get on with creating a language or not.

P.S. note I have completed 16 weeks of my 24 week TB medication.

shelby3 commented 7 years ago

@keean wrote:

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values).

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

rust does not know whether to copy or reference variables in a closure.

Ditto the default behavior I wrote above is what I think I probably want.

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

What is an enum in JavaScript? Afaik, JavaScript has no enum datatype.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

What problem do you see with this?

It probably also needs to distinguish between boxed and unboxed things.

The compiler knows which types are (tagged) sum types and thus must be boxed. Why declare it?

In any case, it is probably safe to infer things at the value level, as long as all these different kinds of things (that have different semantics) are clearly distinguished at the type level.

Does that mean you are agreeing with me?

As you say, this is probably not necessary with GC, but then I think that a hybrid GC would give the best performance

We are sometimes (some aspects) designing different languages. Maximum performance is not the highest priority goal of a high-level language. Performance can be tuned with a profiler (i.e. typically less than 20% of the code needs tuning and perhaps only 5% needs extreme low-level tuning) and can drop down to low-level languages for maximum performance via FFI if necessary.

Marrying low-level details with a high-level language creates design priorities contention. I do not think such a perfect marriage exists. Programmers want a lower complexity budget and only a smaller portion of the code needs that high complexity focus. Python, JavaScript, and Java comprise the most popular programming language set on earth. C/C++ are still as popular as they are because sometimes we must have low-level control and because those other three (well at least Java and especially JavaScript) screwed up the integer types, which is one of things I want to rectify.

Jeff Walker wrote:

As I really learned C++ and began programming in it, I discovered that C++ is a very large and complex language. Why? Well, there are a number of reasons. One is that it follows the zero overhead principle, basically “What you don’t use, you don’t pay for.” That means every language feature has odd limitations and pitfalls to make sure it can be implemented in a very efficient way. Another is that, due to the focus on low level efficiency, there are no safety checks built into the language. So when you make a subtle mistake, which is easy given all the weird edge cases, the program compiles and silently does the wrong thing in a way that maybe succeeds 99% of the time but crashes horribly the remaining 1%. Finally, the language is designed for maximum power and flexibility; so it lets you do anything, even the things you shouldn’t do. This produces a programming minefield where at any moment one might be blown up by some obscure behaviour of the language. Because of that and because other developers and the standard library make use of every language feature, one must learn the whole language. However, C++ is so big and convoluted, learning it is really hard.

Also I agree with Jeff Walker’s analysis of the fundamental reason TypeScript can not radically paradigm-shift to improve upon on the JavaScript minefield (although I disagree with his opinion that adding static typing is regressive):

The real problem with TypeScript is contained in the statement that it is a “superset of JavaScript.”. That means that all legal JavaScript programs are also legal TypeScript programs. TypeScript doesn’t fix anything in JavaScript beyond some things that were fixed in ECMA Script 5.


I find the RAII style leads to clean code in C++. There is a clear advantage to having destructors over finalisers.

You are correct to imply that we must track borrows for safe use of RAII, because if a reference to the local block instance has been stored somewhere, because otherwise RAII can enable use after destruction.

Agreed it is a loss of brevity, convenience, and safety that GC languages such as JavaScript and Java don’t usually support destructors at block-level (or function level) scopes for instances local to that block.

But by minimally tracking borrowing as a compile-time (not runtime!) reference counting scheme as I have proposed, we could support implicit deterministic destructors for block-level instances (and even bypass the GC for these and use compile-time implicit allocation and deallocation, i.e. no runtime reference counting overhead). Good idea! That would be significant feature advantage over other GC languages!

shelby3 commented 7 years ago

@shelby3 wrote:

I am preferring to keep these annotations concise employing symbols:

() wrapping for exclusive borrow, else enclose any read/write symbols with shorthand or + for write-only - for read-only nothing for read-write ! for stored (transitive) borrow

Add ? as an abbreviation for | undefined meaning not set (aka Option or Maybe) as opposed to unavailable (i.e. semantically an exception or error) with | null, which can be transpiled in TypeScript to ?: as type separator only for function argument parameters. This a suffix on the type and the others are prefixed except for ! (and in the order listed for consistency). The ! may be combined with the ? as ⁉️.

Edit: may need to add an annotation for unboxed data structures, or may prefer wrapping with Unboxed< … > than a single symbol annotation, because this is not so verbose, these should be used rarely for explicitly needed binary space compression.

keean commented 7 years ago

Personally I don't like the hieroglyphics, it certainly makes it hard for occasional users to understand (like all the symbols in PL1, APL or perl). Most people can transfer knowledge about things like references from one language to another if the notation is readable.

shelby3 commented 7 years ago

We probably disagree. I do not like verbose text (compounded on top of already verbose textual types, we need some contrast). That is one reason I do not like Ceylon. I like symbols for very commonly used syntax, e.g. the -> for inline function definition. I am surprised that as a mathematician you do not like symbols. What I do not like are symbols for every commonly used function, e.g. Scalaz’s symbol soup. Symbols need to be used in moderation, and we are talking about type annotations (which are already too long and textual), not operators in the executable code.

I do not see any other popular languages with read-only, write-only, read-write, and exclusive borrows annotations, so there is nothing to transfer from. Even Ceylon (and others) adopted ? for nullable types.

Also these annotations on type annotations are mostly to be ignored by the eye, because they are a (type attribute) detail and normally the programmers wants to first focused on the named type annotation which varies between type annotations. Since these little (type attribute) details will be repeating (within a small bounded set of choices) on every type that otherwise are varying (within a unbounded set of choices), they are repetitive almost like noise that should be minimized in terms of visual conspicuity so it does not drown out the more prominent relevant information of the type annotation, e.g. ExclusiveBorrow[ReadOnly[Stored[Nullable[Atype]]]] (even if abbreviated Excl[RO[Stor[Null[Atype]]]]) versus ⊖Atype⁉️ or without Unicode (-)Atype!?.

keean commented 7 years ago

There's a bit of a straw man there, as the type doesn't make sense (you wouldn't have a read-only exclusive borrow, as read only is a reference, and a borrow indicates transfer of ownership, and anything that is referenced must be stored).

The closest type to this would be something like:

Mut[Maybe[TYPE]]

or

type MyType[T]]= Mut[Maybe[T]] RORef[MyType[TYPE]]

The point is the types are expressing the semantic invariants.

However I see no reason not to allow user defined type operators. So '?' can be an alias for maybe etc.

So we should alow symbols in the type definitions, and it can be left up to the programmer whether to use them or not.

shelby3 commented 7 years ago

@keean wrote:

There's a bit of a straw man there, as the type doesn't make sense (you wouldn't have a read-only exclusive borrow, as read only is a reference, and a borrow indicates transfer of ownership, and anything that is referenced must be stored).

Incorrect in my intended context. You’re presuming Rust’s borrowing/ownership model. Please read again my proposal and understand how it differs from Rust.

However I see no reason not to allow user defined type operators. So ? can be an alias for Maybe etc.

So we should allow symbols in the type definitions, and it can be left up to the programmer whether to use them or not.

Disagree. Remember you had also mentioned in the past that for readability consistency, one of the basic tenets of good programming language design is not to unnecessarily have more than one way to write the same thing.

keean commented 7 years ago

This: ⊖Atype⁉️ makes it hard to understand the nesting, for example is it a reference to a nullable "type", or a nullable reference to a type? You would not only need to memorise what the symbols mean, but there precedence, to know which ones apply first.

shelby3 commented 6 years ago

This: ⊖Atype:interrobang: makes it hard to understand the nesting, for example is it a reference to a nullable "type", or a nullable reference to a type?

Lol, the use of the kenkoy :interrobang: emoji.

I am thinking there are no unadulterated (i.e. native, low-level manipulable) null references in the language I think I probably want:

@shelby3 wrote:

With GC, I don’t see any need to explicitly type references. We simply remember those primitive types which are always copied when assigned.

@shelby3 wrote:

I argued above we do not need Ref because (at least for my plans), contrary to Rust I am not attempting to create an improved C or C++ with programmer control over all those low-level details.

@shelby reiterated:

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

As you presumably know, in JavaScript a “null reference” is distinguished as the value undefined from null value which has the value null.

I suppose Undefinable is another type for which we may want a type annotation.

Excluding memory allocation which is inapplicable in GC, afaics the only utility of null pointers is to: a) differentiate the state of unset (undefined) from unavailable (null); and/or, b) to differentiate an instance shared (between data structures) from no instance. Sometimes we would for efficiency prefer to use for example the negative values a signed integer (i.e. the MSB aka most-significant-bit flag) to differentiate the null unavailable state from available positive integers state without requiring boxing (and perhaps a similar bit flag hack for unboxed nullables for other data types), thus perhaps we want to be able to differentiate the #a and #b cases. In no case though would a Nullable[Undefinable[…]] make sense in GC, so we do not need two transposed orderings of those annotations.

Since we probably need #b even for types that are not nullable, then the double question mark is an incongruent symbol choice.

I am contemplating how we might type the bit flag hacked unboxed nullables? And generally for any data type? It consumes some of the complexity budget though.

keean commented 6 years ago

Don't forget with GC you often need Weak and Strong references, where a strong reference prevents the referenced object from being garbage collected, whereas with a weak reference the referenced object can be garbage collected, so you must check if it is still there when dereferencing. You can also have levels of weakness determining the memory pressure required to evict the object, to reflect circumstances like "I want to cache this in memory for speed, providing we have spare memory left, but evict as soon as we are running low on memory" vs "I want to hold this in memory for as long as possible, and only evict if we have no other free memory left".

shelby3 commented 6 years ago

The use of weak references for caches does not work well. Weak references are not needed for breaking cycles if GC is present. The only potentially (but dubious whether) legitimate use for weak reference is for avoiding having to manually undo “put” operations, e.g. removing objects added to a map, list, event/listener list, etc.. I am leaning towards agreeing with David Bruant’s stance that my aforementioned last use case would be encouraging bugs. Also JavaScript has no weak references, so could not support them on a language that compiles to JavaScript unless we implemented our own GC and heap employing ArrayBuffer.

shelby3 commented 6 years ago

Except for the exclusivity and stored types, the others mentioned must also be allowed on the type parameters of a type. I realized that when contemplating a MutableIterator example.

Afaics, we never place these annotations on type definitions, and only on the types of instances (including function arguments and result types).

shelby3 commented 6 years ago

@keean wrote:

In this way the semantics are kept clean, and we don't need concepts like l-values and r-values, and nor do we have the problems Rust has with closures, where it does not know whether to reference or copy an object.

With GC and no stack allocation ever, then closures always reference the objects of the closure (because no need to copy them from stack before the activation record is destroyed when the function returns). The mutability is then an orthogonal concern at the typing level, i.e. with GC there are no r-values. R-values are a compiler optimization and implementation concern and I don’t see why they should be conflated with mutability nor the type system.

I do agree that we need to distinguish between modifying the container and the reference to the container, and each should have a separate mutability attribute. Languages which use stack allocation typically prevent modification of the reference to the container, because this could cause memory leaks, which is why the r-value and l-value concept is introduced. My idea for an efficient cordoned nursery can hopefully eliminate the need for that low-level implementation complication leaking into the PL. However, it’s more efficient to store the containers which have immutable references (i.e. only the container may be mutable) directly in the activation record for the function than to store a reference to the container in the activation record. So the separate mutability attribute is important for optimization.

Note closures over non-contiguous heap allocated cactus stacks would have a separate reference to each activation record that contains an object that is also in the closure. So these multiple references (and the cost to access objects via multiple activation record pointers) are an cost that is paid for cactus stacks.

keean commented 6 years ago

You also need to distinguish variable binding from mutation. A value like '3' is immutable, but we can rebind variables like: x = 3; x = x + 1. The value bound to the variable is immutable, but we can rebind. This explains why changing a variable inside a procedure does not change the argument passed.

shelby3 commented 6 years ago

@keean rebinding is just modifying the reference to the container:

x.ref = new Int(3); x.ref = new Int(x.ref.val + 1)

In a language which uses * for dereferencing:

x = new Int(3); x = new Int(*x + 1)

Instead of the above, JavaScript makes references immutable for primitive objects so they always modify container but since there’s no way to access the reference then rebinding semantically the same as modifying the container (because the reference to 3 is always exclusive):

x = 3; x = x + 1
shelby3 commented 5 years ago

@keean wrote:

Looking at the mess Rust has created with its handling of mutability, and also the confusion l-values and r-values cause in C/C++ I think a simpler model is needed for mutability. I also think that mutability needs to be part of the type signature not a separate annotation.

I agree that mutability should be on the type, and not as is apparently in Rust some orthogonal concept related to borrowing lifetimes that is only annotates the identifier:

let mut i = 1;

Could you please possibly elaborate on how you think Rust messed up mutability so I may know if I’m missing the pertinent details of your point?

I think the problem comes from a failure to distinguish between a value and a location in the type system […] You need a container (or a location in computer terms). You can only write to something that has a location (is an l-value in 'C/C++' terms. A container is something like an array, that has slots the contain values, and the values in the slots can be changed […]

let y = 3 // value assignment, y is immutable, 3 has type Int, 'y' has type Int
let x = Mut(3) // value assignment x is a mutable container that initially has the value 3 in it. x has type Mut[Int]
x = 2 // the contents of the container called 'x' are changed to '2'. This is syntactic sugar for x.assign(2), which would have a type like: (=) : Mut[X] -> X -> Mut[X].

I’m instead proposing for Zer0 we retain the concept of l-values and r-values and thus only l-values implicitly have a container. With the hindsight of my post herein, do you (and if so then why do you) think conversion of r-values to l-values need to be explicit as you showed with Mut(3) in your example above?

I wrote:

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values). It probably also needs to distinguish between boxed and unboxed things.

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

JavaScript, Java, and Python employ call-by-sharing which is distinguished from call-by-reference because only certain objects are passed-by-reference.

Since the grammar I am currently proposing for Zer0 will have explicit pointer types and dereferencing (*) and explicit pointer construction (&), then Zer0 will be call-by-value because pass-by-reference1 can be achieved with a pointer when needed. Except that Zer0 may automatically simulate call-by-value more efficiently2 by actually employing pass-by-reference behind the scenes when passing a large object which is either immutable or being passed to a type which is read-only (i.e. copying would be expensive and stress the L1 cache). In the immutable case, the code will not know pass-by-reference has been employed, because for an immutable object there’s no difference between pass-by-value and pass-by-reference (except for issues about memory safety, stack frame lifetimes, and garbage collection which I explain below). In the read-only case, the difference is irrelevant (unless it can proven no other writers have access for the life of the read only reference) because it makes no sense to pass to a read-only type by copying the value because the raison d’etre of a read-only type is that other references can mutable the value.

I’ve proposed a near zero-cost memory safety abstraction which I developed from @keean’s suggestion of employing Actors (in a zero-memory-resource-cost manner) for parallelism. So only objects which escape the stack frame lifetime (via compiler analysis which doesn’t require any lifetime annotations nor any of Rust’s aliasing error and tsuris) will be allocated on the non-shared (i.e. thread local) bump pointer heap and rest on the stack (each bump pointer heap is deallocated with one instruction when the Actor returns, so it’s to be very efficient on par with Rust’s performance and 100% memory safety). So the compiler will decide whether to allocate containers on the stack or the bump pointer heap. Only specially annotated reference counted pointers get allocated on the traditional shared heap. (I explained all of this in the above linked post, including how the Actor model will eliminate L3 and L4 cache and do software driven cache coherency and cache-to-cache transfers.)

So therefore the compiler will decide implicitly where thread-local containers are allocated (i.e. stack or non-share bump pointer heap). So containers are not to be an explicit concept. And it will be possible to take the address of (&) of any container (i.e. l-value). This applies even to containers which contain pointers (*). So the following is valid code:

obj := Data()
myptr := &obj
myptrptr :: &myptr    // `::` means ~~not re-assignable aka not rebindable~~[immutable]

@keean wrote:

You also need to distinguish variable binding from mutation. A value like '3' is immutable, but we can rebind variables like: x = 3; x = x + 1. The value bound to the variable is immutable, but we can rebind. This explains why changing a variable inside a procedure does not change the argument passed.

In the grammar I proposed for Zer0, re-assignment (aka re-binding) is controlled with :: or := when constructing and initializing via assignment to a container.

Thus a container that contains a pointer or any non-record type is a special case because the mutability of the contained type is dictated by the re-assignability annotation. So in that case either the mutability annotation on the type must be consistent with the re-assignability annotation, or we can decide to make the mutability annotation implicit on the type as it will implicitly be the explicit re-assignability annotation.

EDIT: Zer0 won’t need rebinding if it’s using call-by-value and not call-by-sharing. JavaScript needs to distinguish between preventing rebinding with const versus mutating the fields of the object, because JavaScript employs call-by-sharing which employs pass-by-reference for some objects. Thus, :: in Zer0 would mean not-writable (i.e. read-only or immutable) for the l-value— i.e. that the implicit container can’t be replaced with a new value. The read-only or immutable attribute would also have to written explicitly on the type annotation if the type is not instead inferred. Without call-by-sharing, the only reason to have this :: is to make the not-writable attribute very clear, which is especially helpful even when the type is inferred and not explicitly annotated. It’s also a way of declaring not-writable when the type is inferred.

I have been arguing that it is precisely because type systems do not differentiate between values and containers, that language semantics get very opaque and complex, for example C/C++ have l-values and r-values, and rust does not know whether to copy or reference variables in a closure.

I found the post you made about that on the Rust forum.

The issue being explained there is that by default in Rust, closures refer to the implicit containers for all the items in the environment of the closure. The containers are always implicit, thus it’s impossible in Rust (and C, C++ etc) to have an identifier represent a r-value.

But you’re proposal isn’t necessary because by definition a r-value is not mutable so immutability of containers would accomplish the same effect, which I already have designed into Zer0.

But we also need some way to tell closures to copy from a mutable container instead of referencing it. Rust has some convoluted way of Copy and/or the move semantics which I don’t entirely grok (and I don’t think anyone should ever need to grok something so complexly clusterfucked (c.f. also)).

The default really should be to reference the stack frame as that is the most efficient (only requires one pointer). Copying is less efficient, so I think it should be done manually. The programmer should make copies of the containers he wants before forming the closure.

Rust’s closures are explicit, which IMO defeats the elegance of their primary local use case. I want only implicit local closures. We already agreed that closures at-a-distance (i.e. across modules) is an anti-pattern.

1 Pass-by-reference is what call-by-reference does for every argument of the function or procedure call. So we use the term pass-by-* when referring to assignment in general or only some of the arguments of a function or procedure call.

2 It’s more efficient to pass the reference than to copy the large object; and because having copies of the object in more than one memory location can cause cache spill. OTOH if there will be many accesses to the object, then it may be more efficient to copy so it can be accessed directly indexed off the stack pointer (SP) which may be more efficient than the double indirection of accessing the pointer on the stack and then the object referenced by the pointer. However if we can keep the pointer in a register, then copying to the stack may provide no speed advantage on accesses. Also (in the context of the near zero-cost resource safety model I proposed because of the Actor innovation) if the object escapes escape analysis and Zer0 must put it on the bump pointer heap anyway, then it can’t be copied to the stack.


I wrote:

Remember I had complained to them about the need for the noisy & at the call sites.


Edit: also that last link exemplifies some of the complexity that is involved with having these concepts of references types, value types, and boxed types. I’d rather have the compiler/language deal with those things implicitly. For example, the compiler knows when it needs to use a boxed type because more than one type can be stored there. Unboxed slots in data structures are an optimization. Rust required that complexity because it is tracking borrows in a total order, which IMO is an egregious design blunder.

Note if we adopt my proposal to forsake open existential quantification, then all dynamic polymorphism will be limited to static union bounds, so it will always be possible to unbox (although maybe wasteful if some of the types in union require much more space than the others). Readers should note this is orthogonal to the issue of needing pointers to avoid recursive types that would otherwise require unbounded space (although Rust seems to conflate these two concepts).

The Zer0 programmer will be able to manually force boxing by employing a pointer. Otherwise I think we should make it unspecified as whether the compiler is employing boxing or unboxing. Ditto (as Go already does) unspecified for order and alignment of record fields (c.f. also and also), we want to leave the flexibility for the compiler to do whatever optimizations it wants:

Optimizers at this point must fight the C memory layout guarantees. C guarantees that structures with the same prefix can be used interchangeably, and it exposes the offset of structure fields into the language. This means that a compiler is not free to reorder fields or insert padding to improve vectorization (for example, transforming a structure of arrays into an array of structures or vice versa). That's not necessarily a problem for a low-level language, where fine-grained control over data structure layout is a feature, but it does make it harder to make C fast.

C also requires padding at the end of a structure because it guarantees no padding in arrays. Padding is a particularly complex part of the C specification and interacts poorly with other parts of the language. For example, you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp), so a copy of a struct must retain its padding. In some experimentation, a noticeable amount of total runtime on some workloads was found to be spent in copying padding (which is often awkwardly sized and aligned).

Consider two of the core optimizations that a C compiler performs: SROA (scalar replacement of aggregates) and loop unswitching. SROA attempts to replace structs (and arrays with fixed lengths) with individual variables. This then allows the compiler to treat accesses as independent and elide operations entirely if it can prove that the results are never visible. This has the side effect of deleting padding in some cases but not others.

keean commented 5 years ago

@shelby3 I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.

I also think that it's a mistake to conflate l-values and r-values. Containers are the type of the 'hole' and values go in the hole. These types should be distinguished in the type system. So consider a record (Haskell Syntax):

data Record = R {
   age : Int
   count : Mut Int
}

age is an Int, it's a property of the Record and cannot be changed. Count is a 'hole' that an integer can be stored in, and that value can be read or written. Note: we should hide the 'hole' when serialising, so reads from 'Mut Int' would be better if they are transparent, so I think I have changed my mind on that from before, but it should be a distinct type. So in C terms age is an r-values and count can either be an l-value or an r-value. All objects should behave uniformly so a Record is an r-value (you cannot assign to it) but you can mutate it's Count. A Mut Record can be assigned to. This would all be abstracted by the Readable and Writable type-classes for generics. 'Mut' is not a very good name for this but I can't think of a better one at the moment? Maybe 'Slot'?

shelby3 commented 5 years ago

@keean wrote:

Could you please possibly elaborate on how you think Rust messed up mutability so I may know if I’m missing the pertinent details of your point?

I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.

Agreed. I wrote:

Readers should note this [boxing issue] is orthogonal to the issue of needing pointers to avoid recursive types that would otherwise require unbounded space (although Rust seems to conflate these two concepts).


I also think that it's a mistake to conflate l-values and r-values. Containers are the type of the 'hole' and values go in the hole. These types should be distinguished in the type system. So consider a record (Haskell Syntax):

data Record = R {
   age : Int
   count : Mut Int
}

age is an Int, it's a property of the Record and cannot be changed. Count is a 'hole' that an integer can be stored in, and that value can be read or written.

Your reasoning here appears to be jumbled and not well thought out. We can simply make age immutable and count mutable. That provides the correct semantics. Your proposal to make age a r-value would mean we can’t take the address of if (&age) and that doesn’t seem to make any sense, because either the entire Record is an l-value or r-value depending on context. The l-value and r-value attributes are heavily dependent on context so it doesn’t make any sense to complicate the type system with them. They shouldn’t be part of unification.

sighoya commented 5 years ago

@keean

I'm with @shelby3 with this. I don't like to explicitly box mutables behind types, this is for low level languages. It is of course okay when you do that behind the scenes. I like languages where references are handled like values.

To the philosophical point of a variable: The question is if mutability exists at all, you can see mutable variables as mathematical variables which get rebounded over each time unit. So each access of a variable is different because of rebounding over time.

keean commented 5 years ago

@shelby3 @sighoya

Let's thinks about values like "3" or "(3, 4)" or "{a: 3, b: 4}"

It should be obvious that all these are values, and hence are immutable. If we have reassignable variables they can be rebound to different values.

Hopefully we agree so far.

keean commented 5 years ago

@shelby3 @sighoya

So values are straightforward and we know for example that "3 == 3" or "{a:3, b:4} == {a:3, b:4}" are identities for values.

So what about mutability? To have something mutable there has to be the concept of identity. That is something has a unique identity apart from its value. This identity is always the "address" of the data.

The confusion is that immutable values may or may not have an address, we just don't care, but mutable objects must have an address.

So when it comes to types we need a different type for mutable Vs immutable as this is a deep property (and something most languages get wrong). Something that is mutable has an address and it's address can be taken (to get a pointer).

Something that is a value "may" have an address, but that address is not valid as an identifier (two integers of the same value are not different because their address is different, or one has an address an the other does not). So we should not ever be able to take the address of a value.

keean commented 5 years ago

@shelby3 @sighoya

So one way we could deal with this syntactically is to use tuple notation for value records like this:

Values:
   3
   (3, 4)
   (a:3, b:4)

So a record is just a tuple with named components.

We can then use '{}' to make things into objects (IE make them addressable):

Objects:
   {3}
   {3, 4}
   {a:3, b:4}

We can then "objectify" values by wrapping them in {}:

Objects from values:
   {3}
   {(3, 4)}
   {(a:3, b:4)}
   let x = (a:3, b:4) in {x}

The final thing should be obvious, values can only contain values, but objects can contain both objects and values. We can assign new values to the values in objects, but we cannot mutate them:

let x := (a:3, b:4)
x.a := 2 // error
let y := {a:3, b:4}
y.a := 2 // okay
let z := {a:(a:3, b:4), b:4}
z.a.a := 2 // error
z.a := (a:2, b:4) // okay

Values are obviously pass by value, but because they are immutable the compiler can optimise with references where it is better for performance.

We can allow an object to be passed where a value is expected, and it becomes immutable, so from the on (inside that function) it looks and behaves just like a value. This is the dual of allowing the compiler to pass values by reference for performance. When passing an object as a value the compiler is free to make a copy for performance reasons.

We might use "abs" notation to make a value from an object explicitly:

let x = {a:3, b:4}
let y = |x|
x.a = 2 // okay
y.a = 2 // error
print(x.a) // prints "2"
print(y.a) // prints "3"

That deals with mutability and immutability :-)

keean commented 5 years ago

@shelby3 @sighoya

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

Obviously as values are referentially transparent, there can be no visible references to values, so that simplifies things, as we only have to deal with reference to objects (and objects might be actors - to be discussed elsewhere).

So object references are explicit, except we can overload '.' to do one dereference for objects, so the object type means it is a visible reference, hence although we call it pass-by-reference, we are really passing the reference by value.

If we assume that references default to mutable, then we just need a way to 'cast' a mutable reference to either a read-only or write-only reference.

@shelby3 suggested '::=' for read-only (or was that for converting objects into values, I am not clear on that, which is why I think these things need to be explicit).

How about:

let x := {a:3, b:4}
let y := readOnly(x)
let z := writeOnly(x)

Where readOnly and writeOnly are simply functions with the types forall A . {A} -> ReadOnly {A} and forall A . {A} -> WriteOnly {A} respectively.

In the type system {...} means the thing is both a reference and an object as the two things are synonymous. ReadOnly {...} and WriteOnly {...} are the read-only and write-only versions.

keean commented 5 years ago

@shelby3 @sighoya

Now something a bit different for consideration. Currently with the above syntax '.' gets overloaded to access both values, and objects (references). For values it always creates r-value, but for references it creates l-values or r-values depending on the role in the expression. We msight consider this ambiguity undesirable and instead define:

x<-a // r-value
x->a // l-value

x->a := x<-a + y.a // x is an object, y is a value

Now we can get rid of subsumption from l-values to r-value, and we can be explicit about what we want.

shelby3 commented 5 years ago

@sighoya wrote:

I don't like to explicitly box mutables behind types

I think you know this (you were just taking a short cut in your comments) and I just want to make this clear for future readers. Please note that implementation of “boxing” may employ a pointer but boxing is not the same concept as a pointer or a container, as @keean corrected noted, “I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.” Boxing is a means for dynamic polymorphism because the tag for each possible type is stored along with the value so that runtime reflection on the type is possible. Boxing means only to attach the type tag to the runtime value. It doesn’t mean pointer or container, although those may be necessary for the implementation of a boxed value.

To the philosophical point of a variable: The question is if mutability exists at all, you can see mutable variables as mathematical variables which get rebounded over each time unit. So each access of a variable is different because of rebounding over time.

I remember also making a related point recently in the Subtyping issues thread #8 about math not modeling mutability. A mutable variable can be model in isolation as rebinding, but AFAICT an equation can’t cleanly represent a state machine with unbounded non-determinism. This is why I have heard that some mathematicians are less enthused about computational proofs of math theorems such as the proof of the Four Color Theorem.


@keean I upvoted one of your posts and downvoted the other. Please do not construe the downvote1 as derogatory. My only intent is to make sure future readers readily know that I disagree with the post without needing wade through my walls of text to know it. Now I will explain my disagreement. If I discover I am incorrect, then I will remove my downvotes.

So when it comes to types we need a different type for mutable Vs immutable as this is a deep property (and something most languages get wrong).

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

So what about mutability? To have something mutable there has to be the concept of identity. That is something has a unique identity apart from its value. This identity is always the "address" of the data.

I understand we mutate the binding to the identifier or the memory container, not the value itself. That is an important detail, but I argue it is an implementation detail that doesn’t belong in the type system.

The confusion is that immutable values may or may not have an address, we just don't care, but mutable objects must have an address […] Something that is a value "may" have an address, but that address is not valid as an identifier (two integers of the same value are not different because their address is different, or one has an address an the other does not).

The concept of value versus variable which is completely covered by immutability is orthogonal (even if we accomplish mutability by only rebinding— i.e. referential transparency— instead mutating the memory container) to the concept of address location in memory.

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to values if we are comparing whether they point to the same value— i.e. the address is a value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

So we should not ever be able to take the address of a value.

Agreed that values are always r-values. Values can also be stored in memory, so the address of their memory container is an l-value. So you’re correct that we can’t take (&) the address of a r-value. But since values can be stored, then unless we complicate our model with explicit containers then when taking the address of the expression (e.g. identifier) which refers to a value, then we are implicitly taking the address of the memory container where the value is stored as a language design implementation detail. So syntactically we are taking the address of the l-value. Instead of explicit containers in the type system as your proposed, I prefer implicit containers and the concept of l-values and r-values because of the following.

The pointer address (aka container) where a value is stored is as a language design implementation detail that has nothing to do with the typing. Because we don’t want bifurcate our type system in a “What Color Is Your Function?”-like non-interoperability based on for example whether a value is stored in a register, on the stack, inside another record which is a container, etc.. There are far too many combinations of scenarios to model. So modeling it in the type system would create complexity. And for what gain? The type system is for insuring soundness (aka type safety) so that we don’t get segfaults. The soundness issues w.r.t. to storage location (e.g. use-after-free w.r.t. for example stack allocation or GC) are handled orthogonally as I explained in my post before my prior one.

We can assign new values to the values in objects, but we cannot mutate them:

let x := (a:3, b:4)
x.a := 2 // error
let y := {a:3, b:4}
y.a := 2 // okay
let z := {a:(a:3, b:4), b:4}
z.a.a := 2 // error
z.a := (a:2, b:4) // okay

You’re employing literals to indicate mutability of the type. I would prefer to just write the mutability on the type and not proliferate more subtle variants of expression syntax2 than necessary (because that confuses new users and create inward bound costs that increase attrition of new adopters of the language). Records should have only one syntax and then their type can modulated on the type annotation.

Is x a r-value? If yes, then what purpose does it serve that treating x as an l-value wouldn’t? Is your reason because you want to prevent its address from being taken? But the soundness of that is an implementation detail that is impacted for example by use-after-free.

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

I have also all those distinct attributes on types in my proposed draft of the grammar (have not yet uploaded it but I will soon). But because of my aforementioned preference for implicit containers with l-values and r-values, I’m conceptualizing it different than your taxonomy. Your taxonomy is that mutability is on the container and the other attributes are on the value. Also I find one fault in your taxonomy because writing to a value makes no sense because values by definition can never be modified, so you mean that a value can’t be read but you should name that “no-read” since writing has nothing to do with it because values can never be written to. This exemplifies that explicit containers will create bifurcation of needing no-read on the value and mutability on the explicit container. That is too many degrees-of-freedom because I see no need for it.

In my mental model, immutability means none of any references to the same l-value (which in my model an l-value is conflated with its implicit container) can mutate the value inside the container. R-values of course are always immutable because by definition they can’t be referenced. Read-only type applies only to the specific reference with that read-only type— that reference may not mutate the value inside the container but other references to that same container might have permission to change the value inside the container. Write-only means the value in the implicit container can be written but the value can’t be read.

@shelby3 suggested ::= for read-only (or was that for converting objects into values, I am not clear on that, which is why I think these things need to be explicit).

No, it instead means non-writable, which can be either immutable or read-only as defined above. I had explained it up-thread:

EDIT: Zer0 won’t need rebinding if it’s using call-by-value and not call-by-sharing. JavaScript needs to distinguish between preventing rebinding with const versus mutating the fields of the object, because JavaScript employs call-by-sharing which employs pass-by-reference for some objects. Thus, ::= in Zer0 would mean not-writable (i.e. read-only or immutable) for the l-value— i.e. that the implicit container can’t be replaced with a new value. The read-only or immutable attribute would also have to written explicitly on the type annotation if the type is not instead inferred. Without call-by-sharing, the only reason to have this ::= is to make the not-writable attribute very clear, which is especially helpful even when the type is inferred and not explicitly annotated. It’s also a way of declaring not-writable when the type is inferred.

1 I wish Github offered other icons for expressing opinions on posts. I would like to be able to indicate constructive “disagree” without it implying frowning (i.e. thumbs down) on the post. Also like to express other reactions such as the following, although so many choices might be noisy 🤔, 🤨, 😲, 🤯, 😵, 😠, 🤬 (:rage:), 🤥, 🤫, 🧐, 💩, 🦸‍♂️, 🧞, 💤, 💭, 🕳, 💎, 🍀, 🏆, 🎯, 🎲, 🔔, 💡, 📌, 🗿, , , , :interrobang:, 🆒 (:sunglasses:), 🆘, :star:, :hurtrealbad:, :muscle:, :bomb:, :information_source:, :white_check_mark:.

2 Expression syntax as distinguished from type annotation syntax.

keean commented 5 years ago

@shelby3

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to two values if we are comparing whether they point to the same value— i.e. the address is value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

This is not right. For values we do not care about the address at all, two values are equal if they are the same. So 3 == 3 no matter what address they are stored at. It is important that values have referential transparency.

For objects it is different, objects are only equal if their addresses are equal, even if they have the same 'value'. This is because they are mutable.

This is very close to JavaScript typing, so it's not an alien concept. In JavaScript things like Ints and Strings are values and objects are not. So let's look at a comparison with javascript:

3 // value in JS
(3, 4) // JS does not have tuples.
(a:3, b:4) // JS does not have a record value notation.

{3} // JS does not allow an object with a single anonymous property
{3, 4} // JS does not allow position indexed objects
{a:3, b:4} // valid JS

So you can see the important categories like integer values and objects with named properties behave like in JavaScript. All we are doing is filling in some missing types:

           Value Object
Singleton   JS 
    Tuple
   Record         JS

So all I am doing is providing types/syntax for the bits missing in JS. This also provides a simple consistent model that separates the datatype (singleton, indexed, named) from whether the data is a value or an object.

keean commented 5 years ago

@shelby3

I would prefer to just write the mutability on the type and not proliferate more subtle variants of expressions syntax than necessary (because that confuses new users and create inward bound costs that increase attrition of new adopters of the language). Records should have only one syntax and then their type can modulated on the type annotation.

We need to be clear that value semantics Vs object semantics is something distinct from a read-only, write-only or read-write reference.

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

sighoya commented 5 years ago

@shelby3 wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

Why no keyword/attribute tags like in rust. What is different in storing mutable or immutable integers behind some variable in memory. Types should state something about how abstractions are mapped to internal representations, two different types illustrate two different internal representations even if the abstraction is the same (may for performance reasons).

I don't see why type inference is compromised because of (im)mutable attributes.

A mutable variable can be model in isolation as rebinding, but AFAICT an equation can’t cleanly represent a state machine with unbounded non-determinism.

You mean a deterministic equation, what about equations with random* constituents? The main problem with mutability is not the mutability but the missing history of rebindings, as you said there is no global observer in the universe to record them.

*Of course, it is questionable how good computers simulate randomness and otherwise if randomness exists at all. Does non determinism exists or does it follow deterministically some certain irrational number. We can't decide that even if we know all observations before.

keean commented 5 years ago

@shelby3 @sighoya

Are we agreed that we need both value semantics and object semantics in the language?

Value semantics provide referential transparency and greatly simplify things where they are valid.

Object semantics provide for mutation and identity, which allows for mutation whilst preserving identity.

shelby3 commented 5 years ago

@keean wrote:

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to values if we are comparing whether they point to the same value— i.e. the address is a value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

This is not right. For values we do not care about the address at all, two values are equal if they are the same. So 3 == 3 no matter what address they are stored at. It is important that values have referential transparency.

You’re literally not comprehending what I wrote. I admit the way I wrote it was convoluted and abstruse (because my mind was operating in logic mode and not in communication mode). Let me clarify.

I didn’t claim that we need the addresses to compare values. I stated that an implementation optimization could optionally compare the addresses of l-values to quickly eliminate if the two addresses point to the same value (obviously if the value is stored in the same place it must have the same value). This optional optimization for quickly excluding the exceptional case is from the perspective of expressions dynamically referring to l-values. The optimization isn’t possible when comparing r-values. And the optimization may be slower where the exceptional case is rare.

For objects it is different, objects are only equal if their addresses are equal, even if they have the same 'value'. This is because they are mutable.

This is a higher-level semantic that should be above the layer of the language design. The programmer should decide whether he wants to compare addresses of containers when determining if two containers are considered to be equal. We will have pointers in the language so the programmer can control this semantic independently of our design of the language.

{3} // JS does not allow an object with a single anonymous property
{3, 4} // JS does not allow position indexed objects

This is not correct. JS has Array(1) and Array(2) and these are Objects.

Your objection would be that Array(2) is not a type, but JS does not have static typing. Typescript has tuples.

All we are doing is filling in some missing types:

           Value Object
Singleton   JS    JS
    Tuple         JS
   Record         JS

So all I am doing is providing types/syntax for the bits missing in JS. This also provides a simple consistent model that separates the datatype (singleton, indexed, named) from whether the data is a value or an object.

Let me draw your attention again to what I wrote up-thread yesterday (do note I corrected it since you read it):

JavaScript, Java, and Python employ call-by-sharing which is distinguished from call-by-reference because only certain objects are passed-by-reference.

Since the grammar I am currently proposing for Zer0 will have explicit pointer types and dereferencing (*) and explicit pointer construction (&), then Zer0 will be call-by-value because pass-by-reference1 can be achieved with a pointer when needed. Except that Zer0 may automatically simulate call-by-value more efficiently2 by actually employing pass-by-reference behind the scenes when passing a large object which is either immutable or being passed to a type which is read-only (i.e. copying would be expensive and stress the L1 cache). In the immutable case, the code will not know pass-by-reference has been employed, because for an immutable object there’s no difference between pass-by-value and pass-by-reference (except for issues about memory safety, stack frame lifetimes, and garbage collection which I explain below). In the read-only case, the difference is irrelevant because it makes no sense to pass to a read-only type by copying the value because the raison d’etre of a read-only type is that other references can mutable the value.

Thus I see no valid reason nor need for your complex separation-of-concerns because my proposal restores the consistency that JS doesn’t have:

            Object/Value
Singleton      Zer0
    Tuple      Zer0
   Record      Zer0

JavaScript and Java have the problem that they don’t have pointers. Thus those PLs without pointers have to choose a design which is either call-by-reference or call-by-sharing, because there’s no way for the programmer to express a pass-by-reference exception to call-by-value without pointers.

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

I am proposing that for Zer0, we remove that “value syntax difference” of JavaScript, because we have pointers. Because Zer0 can employ call-by-value instead of call-by-sharing due to the presence of pointers.

Are we agreed that we need both value semantics and object semantics in the language?

Value semantics provide referential transparency and greatly simplify things where they are valid.

Object semantics provide for mutation and identity, which allows for mutation whilst preserving identity.

Could you clarify what you mean in the context of my latest replies, so I can be clear on how your ideas relate to mine once you have factored in my clarifications?

keean commented 5 years ago

@shelby3

How do I represent an immutable record in your system? Note this should be a value and have value semantics.

keean commented 5 years ago

@shelby3 it seems you don't appreciate the difference between immutable and read only.

An immutable value is always immutable no matter how you access it. The immutability has nothing to do with pointers or how you access it. Infact because we want referential transparency you cannot have a pointer to a value (that would destroy referential transparency). This is why 'C' is not referentially transparent because &3 != 3 which is a bad thing. Values should literally have no knowable address like in functional languages.

Objects are mutable and non-referentially transparent. You can have a pointer to an object and because you can have a pointer there are different kinds of access you can give, read-only, write-only and read-write.

It seems to me you only have objects in the system you are thinking of, and that throws away all the benefits of referential transparency in functional languages. My approach is that we want referential transparency and immutability wherever possible. Again note that a read-only pointer to a mutable object is a different thing to an immutable value.

shelby3 commented 5 years ago

@keean wrote:

Again note that a read-only pointer to a mutable object is a different thing to an immutable value.

That’s a pleonasm. How can a value ever be mutable? The definition of values you provided up-thread is an immutable thing that can’t be addressed (i.e. has no identity).

You can have a pointer to an object and because you can have a pointer there are different kinds of access you can give, read-only, write-only and read-write.

Okay so good you implicitly admit I was correct where I pointed out that you were incorrect to associate those properties with values. You should explicitly admit these things so readers are not left in the dark. That is only way to do conscientious and constructive discussion.

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

I have also all those distinct attributes on types in my proposed draft of the grammar (have not yet uploaded it but I will soon). But because of my aforementioned preference for implicit containers with l-values and r-values, I’m conceptualizing it different than your taxonomy. Your taxonomy is that mutability is on the container and the other attributes are on the value. Also I find one fault in your taxonomy because writing to a value makes no sense because values by definition can never be modified, so you mean that a value can’t be read but you should name that “no-read” since writing has nothing to do with it because values can never be written to. This exemplifies that explicit containers will create bifurcation of needing no-read on the value and mutability on the explicit container. That is too many degrees-of-freedom because I see no need for it.

In my mental model, immutability means no references can mutate the implicit container (which is conflated with value for an l-value expression). Read-only type means only that the reference (the type applies to) is prevented from writing to the l-value. Write-only means the implicit container can be written but the value can’t be read.

shelby3 commented 5 years ago

@keean wrote:

Are we agreed that we need both value semantics and object semantics in the language?

Agreed.

It seems to me you only have objects in the system you are thinking of, and that throws away all the benefits of referential transparency in functional languages.

Why do you think that? I wrote that Zer0 has an immutability annotation.

we want referential transparency and immutability wherever possible.

Agreed.

An immutable value is always immutable no matter how you access it. The immutability has nothing to do with pointers or how you access it. Infact because we want referential transparency you cannot have a pointer to a value (that would destroy referential transparency). This is why 'C' is not referentially transparent because &3 != 3 which is a bad thing. Values should literally have no knowable address like in functional languages.

I think you may be conflating orthogonal concerns but I’m not sure. AFAICT, pointers have nothing to do with immutability and referential transparency. You should not be comparing a pointer type to a non-pointer type any way. That should be an illegal comparison prevented by the type mismatch.

I am going to guess that your mistake is that you’re conflating the fact that pointers to mutable things breaks referential transparency. Or that pointers are often employed in imperative programming. But AFAICT it’s not pointers that actually create any referential opacity. Rather its mutability and side-effects that create referential opacity.

Also just because we have pointers doesn’t mean we have to use them. AFAICT, there is no way for taking the address of an immutable object to somehow make other references to that object referentially opaque. Actually mutating pointers may make code referentially opaque, but AFAICT that is an orthogonal concern that is at the programmer’s discretion. IOW our compiler can analyse code and decide whether it is referentially transparent.

Please try to refute and enlighten me.

shelby3 commented 5 years ago

Readers please note that @keean is ostensibly getting the concept of value semantics from Elements of Programming by Alexander Stepanov:

First, you get value semantics by default. When declaring function arguments or return values, if you specify only the type name (like int) you get value semantics (you pass and return by value). If you wan to use reference semantics, you must make an extra effort to add a reference or pointer type symbol.

Second, we use value semantics in function declarations, because it closely follows the notation and reasoning from mathematics. In mathematics you operate on values. For instance, you define a function as follows:

f: int × int → int
    f(x, y) = x·x + y·y

This is very similar to:

int f( int x, int y ){
  return x * x + y * y;
}

Notice AFAICT the absence of pointers has nothing to do with achieving value semantics, because value semantics can be achieved with pointers:

int f( const int* x, const int* y ){
  return (*x) * (*x) + (*y) * (*y);
}

Third, we do not run into any memory management issues. No dangling references to nonexistent objects, no expensive and unnecessary free store allocation, no memory leaks, no smart or dumb pointers. The support for value semantics in C++ — passing variables by value — eliminates all those problems.

I already explained up-thread that the near zero-cost resource allocation strategy I proposed for Zer0 with the Actor paradigm will eliminate this issue even when not using value semantics.

Fourth, we avoid any reference aliasing problems. Andrew Koenig has neatly illustrated the problem of reference aliasing in this article. In multi-threaded environment passing by value and ensuring that each thread has its own copy of the value helps avoid any unnecessary data races. Then you do not need to synchronize on such values, and the program runs faster, and is safer because it avoids any deadlocks.

Again the Actor paradigm I have been referring to resolves all the thread synchronization issues. Note that proposal can’t (as Rust can) prove the absence of pointer aliasing for the purposes of compiler optimization. For that optimization we either use an unchecked annotation analogous to C’s restrict and/or of course we use referential transparency which will be an option in Zer0.

Fifth, for referential transparency. This means that you get no surprises where your data is modified behind the scenes without you being able to see that.

Again this all about immutability, which will be an optional type in Zer0.

@keean I really don’t see the problem. Please enlighten me.

shelby3 commented 5 years ago

@sighoya wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

Why no keyword/attribute tags like in rust.

What do you mean? The grammar I am proposing for Zer0 has an optional immutability attribute.

What is different in storing mutable or immutable integers behind some variable in memory.

What does that sentence mean?

Types should state something about how abstractions are mapped to internal representations, two different types illustrate two different internal representations even if the abstraction is the same (may for performance reasons).

What is your point? I must not be grokking how the above relates to anything in this discussion.

I don't see why type inference is compromised because of (im)mutable attributes.

It’s not. Who said it is?

sighoya commented 5 years ago

@shelby3 , you wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing

What do you mean with "mutability, "immutability" as "types". Do they represent a container like Mut[Int], Immutable[Int] or do you provide this over attributes. mutable int i; immutable int i;

Why is soundness compromised if mutability and immutability are no types?

keean commented 5 years ago

@shelby3 it needs to be a type because the information needs to be propogated.

Why would you want the information to be invisible? As an attribute it is information that you want to propogate with the type but you cannot see it. This is bad because it is invisible. All the information you want to propogate should be part of the visible type.

An Int like "3" cannot be mutable as it's a value, so I presume you mean something like an object containing the value "3"?

For example in Javascript:

let x = 3
let y = x
x = 2
console.log(y) // prints 3

let x = {v:3}
let y = x
x.v = 2
console.log(y.v) // prints 2

Personally I think the way JS handles objects and values is more high level than the way C handles them.

I still think you should not be able to take a pointer to a value? Can you give an example of why you would do this?

shelby3 commented 5 years ago

Note I edited this to hopefully make it more lucid:

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

I am proposing that for Zer0, we remove that “value syntax difference” of JavaScript, because we have pointers. Because Zer0 can employ call-by-value instead of call-by-sharing due to the presence of pointers.

keean commented 5 years ago

@shelby3 I think pointers are more suited to a low level language.

For example this does not make sense:

let y = &3 // error

let x = 3
let y = &x // okay

Somehow, mysteriously our value has turned into an object. This kind of invisible implicit behaviour is what really confuses people and leads to serious bugs that threaten the security of software.

shelby3 commented 5 years ago

@keean wrote:

An Int like "3" cannot be mutable as it's a value, so I presume you mean something like an object containing the value "3"?

For example in Javascript:

let x = 3
let y = x
x = 2
console.log(y)   // prints 3

let x = {v:3}
let y = x
x.v = 2
console.log(y.v) // prints 2

That’s a constructive direction to take the discussion into an example. So let’s compare that to what I’m proposing for Zer0:

x := 3         : Int       // these type annotations are optional and can be inferred
y :: x         : !Int      // the `!` means immutable. The `<-` prefix for read-only isn’t the type because `x` is copied to `y` not passed-by-reference
x = 2
console.log(y)   // prints 3

data V<A>(v)   : A => V(A) // Equivalent to `data V<A> = V(v) : A => V(A)`. This type annotation isn’t optional. I didn’t choose syntax `V<A>(v : A)` because I want it to be consistent with the fact that annotations are always off to the right side (also because otherwise it would clutter the default arguments case).
x := V(3)      : V(Int)
y :: x         : !V(Int)
x.v = 2
console.log(y.v) // prints 3

The difference is because I’m proposing Zer0 is always call-by-value (in the above case imagine an implicit assignment function that takes the RHS as a copy and assigns it to the LHS).

Let’s compare a different example which explains why there’s no concept of rebinding proposed for Zer0:

const x = 3
const y = x
x = 2           // illegal because of `const`
console.log(y)  // not printed because program is illegal

const x = {v:3}
const y = x
x.v = 2
console.log(y.v) // prints 2

And in Zer0:

x :: 3         : !Int
y :: x
x = 2            // illegal because of immutability
console.log(y)   // not printed because program is illegal

data V<A>(v)   : A => V(A)
x :: V(3)      : !V(Int)
y :: x
x.v = 2          // illegal because of immutability
console.log(y.v) // not printed because program is illegal

Personally I think the way JS handles objects and values is more high level than the way C handles them.

C and C++ are call-by-value. The only way to get pass-by-reference is for the programmer to explicitly employ a pointer. Thus they only need the concept of l-values and r-values. Ditto my proposal for Zer0. JavaScript makes everything confusing because of the inconsistency of call-by-sharing (or call-by-reference in the case of Java) which is required because there’s no pointers for the programmer to use to simulate pass-by-reference when the programmer wants pass the value by-reference.

IOW Java, JavaScript, and I think also Python conflate objects and values because they provide the programmer no means to choose (i.e. control) whether to pass the value by-reference or by-value.

I still think you should not be able to take a pointer to a value? Can you give an example of why you would do this?

I mentioned up-thread that we need to take the address of record fields in some cases.

Why would you want the information to be invisible? As an attribute it is information that you want to propogate with the type but you cannot see it. This is bad because it is invisible. All the information you want to propogate should be part of the visible type.

I explained this more than once up-thread. The soundness of the type system doesn’t need an explicit container for values. Making them explicit will bifurcate the scenarios creating a complex maze of interaction of typing. I can’t think of any reason to want to do that. Can you show any example that will not work in my proposal for Zer0?

I think pointers are more suited to a low level language.

One of our goals has been to support both low-level and high-level programming in the same programming language. Also we’re contemplating transpiling to Go, which has pointers.

We already agreed that we should not have pointer arithmetic because it’s unsound and violate the mathematical model for memory and arrays. Go also doesn’t allow pointer arithmetic.

For example this does not make sense:

let y = &3 // error

let x = 3
let y = &x // okay

Somehow, mysteriously our value has turned into an object.

I don’t see any problem with the above code example. The 3 is an r-value thus can’t take the address of it. The x is an l-value.

This kind of invisible implicit behaviour is what really confuses people and leads to serious bugs that threaten the security of software.

Please show me an example of how it creates a bug.

shelby3 commented 5 years ago

@sighoya wrote:

What do you mean with "mutability, "immutability" as "types". Do they represent a container like Mut[Int], Immutable[Int] or do you provide this over attributes. mutable int i; immutable int i;

Answered in my post above which was a reply to @keean.

Why is soundness compromised if mutability and immutability are no[t] types?

Because if the compiler doesn’t know that a type is suppose to be immutable, then it would allow writes where writes shouldn’t be allowed. Your question seems to be basically asking, “why do have typing.”

sighoya commented 5 years ago

@shelby3 wrote:

Because if the compiler doesn’t know that a type is suppose to be immutable, then it would allow writes where writes shouldn’t be allowed.

It seems you don't like modifiers on types. It would be ok if you state that !Int is a restriction on a type by the modifier "!", but to create a separate type for it?

What for a fun the user has to differentiate between three assignment/bind operators: :=, ::=, =

Think about the fact that assignments are used very often, the longer the assignment operator the more onerous is it's usage.

Your question seems to be basically asking, “why do have typing.”

So? How does Java compile with modifiers?

keean commented 5 years ago

@sighoya I don't think Java is a good example of a type system. I don't think it has proper immutables, read-only is a reference property and can be cast away.

keean commented 5 years ago

@shelby3

I don’t see any problem with the above code example. The 3 is an r-value thus can’t take the address of it. The x is an l-value.

How do we know it's an l-value or an r-value if it's not visible in the type system.

I find your proposed system very messy. Why do we need '!Int' for an immutable Int, when Ints cannot be mutable in the first place.

It should be clear that literals have no address. It should be clear that values are immutable. We should use value semantics wherever possible.

Anything mutable must be an object because it has an address. That's what is the difference between an object and a value.

shelby3 commented 5 years ago

@keean wrote:

Your question seems to be basically asking, “why do have typing.”

So? How does Java compile with modifiers?

I don't think Java is a good example of a type system. I don't think it has proper immutables, read-only is a reference property and can be cast away.

Readers may want to read the reasons why we hate Java.