[WIP] [RFC] Nim should have `immutable` type qualifier (`let` is more like C++'s const); it would enable a new string class design, etc

timotheecour commented 6 years ago

Nim doesn't have an immutable concept that denotes data that cannot be modified through any reference (see example 1 to show let doesn't satisfy this stronger guarantee)
let a=bar is shallow, not deep, see see example 2.

So let a=bar is more like const in C++ (or const in D, except that D's const is deep) See also https://dlang.org/spec/const3.html for distinction between D's const and immutable, or Example of const vs. immutable

This lack of type safe causes bugs, eg: Writing to cstring, SIGBUS error #8463 (this would be a compile error with immutable strings, as is case in D)

Nim docs are misleading or at least imprecise regarding meaning of `let`:

doc/tut1.rst

Parameters are immutable in the procedure body. By default, their value cannot be changed because this allows the compiler to implement parameter passing in the most efficient way

and:

The let statement works like the var statement but the declared symbols are single assignment variables: After the initialization their value cannot change

and, in error messages involving a let variable:

ERROR: 'a' is immutable

however, this immutability guarantee isn't true as example below shows:

example 1: let is more like C++ const

# D20180719T124716
proc test_let_not_immutable()=
var a=newSeq[int](1)
proc sneaky() = a[0]=10
proc foo(a:var seq[int]) = discard

proc fun(a: seq[int])=
echo a #prints @[0]
let a2=a
echo a2 #prints @[0]
sneaky()
echo a #prints @[10], the "immutable" param a changed
echo a2 #prints @[10], the "immutable" a2 changed

# NOTE: the compiler indeed thinks `a2` is immutable, as line below would show if uncommented:
# foo(a2) would give ERROR: 'a2' is immutable
fun(a)
test_let_not_immutable()

example 2: let is shallow, not deep


proc test_let_is_shallow()=
type Foo=object
b:ref int
proc initFoo():Foo = result.b.new()

let a=initFoo()
echo a.b[] # 0
a.b[] = 2 # shows `let` is shallow, not deep
echo a.b[] # 2

test_let_is_shallow()


## why do we need type qualifiers?
* type qualifiers enables distinguishing between `mutable array of immutable int` data vs `immutable array of int`
```nim
let a0 = 1
assert type(a0) is rconst(int) # see below
var a1 = 1
assert type(a1) is int

var a1: Slice[rconst(char)] # a mutable slice of rconst char
a2 = ... # ok
var a2: Foo[rconst(Slice[char]), int] # `rconst` can be put in any node of the type

NOTE: for Slice, a possible implementation is Range from https://github.com/status-im/nim-ranges

This concept is very useful in things like containers, eg:

type Matrix[T] = object
  width, height: int
  data: Slice[T]

var a:Matrix[rconst(int)](data=@[1,2,3, 4], width = 2, height = 2)
a.reshape([4, 1]) # reshapes width height without changing data

why do we need `immutable` concept ?

immutable enables compiler optimizations (avoids copies, allocations)
immutable enables thread safe sharing of immutable data
(probably most important point) there's been a number of discussions (see below) involving design of a new string type that would avoid the inefficiencies of current design: it allocates via GC on every assignment, even in for i in 0..10: let a="foo" which allocates 10 times, see https://forum.nim-lang.org/t/4060#25287; it also prevents efficient slicing of strings which makes a lot of code inefficient and has deep consequences in terms of library design as library authors have to resort to workarounds [4]

The design I (and probably others) have in mind (can do a writeup later) would be closely modeled after D's string design which defines a string as alias string = immutable(char)[];, mutable strings as char[], all of which are just special cases of dynamic arrays T[]. This would enable safe and efficient string handling.

NOTE: we'd exclude some bad decisions in D's standard library, eg auto-decoding of UTF8 strings (and all it's associated bad consequences: ForEachType vs ElementType, and all string special casing in std.algorithm)

proposal: type qualifiers `rconst` and `immut`

Currently we have:

const a=1 # a is compile time constant
var a=1 # a is "int" (aka mutable)
let a=1 # a is "rconst(int)" NOTE: can't call this "immutable(int)" as this isn't the case; can't call it const(int) as const already means "compile time const"; so rconst can be used, meaning run-time const

we introduce type qualifiers immut as well as immut a = ... syntax:

immut a = 1 # a is immut(int)

Example:

# analog to D's `alias string = immutable(char)[];`
type dstring = Slice[immut(char)] # for Slice, see https://github.com/nim-lang/Nim/issues/8256
var a1: dstring
a1 = "foo" # ok: a itself is mutable; NOTE: no deep copy, no allocation
a1[0] = 'b' # error: a1[0] is immut(char) and can't be changed
let a1b = a1 # ok; no deep copy, no allocation

var a2:Slice[char]
a2 = "foo" # error: Error: cannot implicitly convert expression "abc" of type dstring to Slice[char]
a2 = "foo".dup # ok: deep copies
a2[0] = 'b' # ok

var a3: Slice[rconst(char)]
a3 = "foo".dup # ok: deep copies
a3[0] = 'b' # error: a3[0] is rconst(char) and can't be changed

proc fun_immut(a: dstring) = discard
proc fun_rconst(a: Slice[rconst(char)]) = discard

fun_immut(a1) #ok
fun_immut(a2) # error: cannot pass  Slice[char] to Slice[immut(char)]
fun_immut(a3) # error: cannot pass  Slice[rconst(char)] to Slice[immut(char)]

fun_rconst(a1) # ok
fun_rconst(a2) # ok
fun_rconst(a3) # ok

NOTE: as for shallow vs deep, I need to think more on what would be the best, there are pros and cons.

footnotes

[1] https://forum.nim-lang.org/t/1793 Immutable String Type Vs. Shallow/Freeze Procedure
[2] https://forum.nim-lang.org/t/4060 string literals should not allocate; should allocate in ROM an extra \0
[3] Walter Bright's initial post on invariant strings: http://www.drdobbs.com/architecture-and-design/invariant-strings/228700475
[4] example of workarounds for lack of slicing: all the start = 0 parameters in lib/pure/parseutils.nim complicate the code and would not be needed if using slices

Araq commented 6 years ago

Deep immutability can eventually be done with my https://nim-lang.org/araq/writetracking.html "Write tracking" idea which works much better than encoding this in the type system. It still has its problems though.

andreaferretti commented 6 years ago

(deep immutability can be done by declaring all fields private, and exporting only read accessors, recursively, although this is inconvenient)

GULPF commented 6 years ago

(deep immutability can be done by declaring all fields private, and exporting only read accessors, recursively, although this is inconvenient)

I really wish something like #6890 would be implemented, which would make this easier.

mratsim commented 6 years ago

Linked to https://github.com/nim-lang/Nim/issues/6793 - Return by let/const value (Apply parameter constraints to return values)

Quote:

Indeed, it wouldn't work once it's wrapped in a var container.

For me it is similar to the "let" for ref object: it is shallow immutable.

Right now you can choose to:

never being able to shoot yourself in the foot (value semantics)

shoot yourself in the foot if you don't pay attention (ref semantics)

This would be a third option: a protective gear, it doesn't stop everything but is good enough for common use cases.

In any case I see that as an optimization not as a protection. (And I expect containers of expensive-to-copy objects will keep a ref to those instead of a copy).

The fourth option (?) would be write tracking ?

The fifth option would be a borrow-checker.

andreaferretti commented 6 years ago

I am pretty sure somewhere in the issues there is a long discussion between me and @Araq where I am confused by the fact that the presence of a pointer invalidates the guarantees of immutability, so I am with you on this.

That said, I am against yet another language feature. One compromise would be a macro in sugar.nim which allows to declare immutable objects by not exporting the fields but only their read-only accessors.

andreaferretti commented 6 years ago

(The discussion is here https://github.com/nim-lang/Nim/issues/5753 )

Bulat-Ziganshin commented 6 years ago

I started to use C++ before const-madness and I don't like this C++ feature. Once a language is serious about const-ness propagation, it's a plague that affects all your code. You can't have func(char*) and call func("str") because "str" now is const char *. Now, for every parameter/variable you need to figure out whether it's const or not in each level of indirection. Generic functions generates 2^n more versions since each parameter type, on each level of indirection, may be const or not.

And all that mess just to detect some type of errors. It's the Rust way - complicate every line of program just to prevent one type of bugs. It's why I prefer idea of effects system - if it can catch some errors without complicating the code, it will be OK for me.

Without language changes, you can implement ImmutableString and ImmutableArray types and ImmutableSpan to operate on their parts. What you think about this approach? In particular, it will allow to accumulate real experience with immutable data types before going with const in the language proposal again.

mratsim commented 6 years ago

@Bulat-Ziganshin there is no const-madness in Nim.

Let is already there and does not complicate the code, if anything it eases debugging. The issue is that let for pure value types does not have the same semantics as let for ref types.

For pure value types, let immutability is deep, but when a pointer/reference is involved, immutability stops at the pointer level. This RFC promotes a way to guarantee deep immutability which is desirable in several situations, especially concurrency and optimizations (optimizations rely on invariants).

I agree with the goal, having asked this several times to allow move/copy elision in my tensor library (copying 8GB of GPU memory is a big no-no). However I think we should either keep the current semantics and have a very nice page explaining immutability in Nim (i.e. for pointers/ref , it's the memory address that is immutable not what is contained at that address) similar to Rust page on interior vs exterior immutability and add write-tracking or have let just be deep immutable.

Also, we shouldn't use char pointer as a comparison point, Nim strings are very different from C++ strings, Nim already tracks constness of strings and seq properly.

There is this short study on mutability semantics, seems like even Ocaml (that uses let) is shallow immutable, and quick Googling shows that Scala too so even functional languages (except pure functional lang like Haskell) are not clear cut on immutability.

andreaferretti commented 6 years ago

@mratsim Scala has different approach. It uses val instead of let to signify immutability, but the catch is that you can use val both on a variable and on a class field. This means that you can have a mutable reference to an immutable data type, an immutable reference to a mutable data type and all other combinations.

class Mutable(var x: Int)
class Immutable(val x: Int)

val x1 = Mutable(0) // immutable reference to a mutable object
var x2 = Mutable(0) // mutable reference to a mutable object
val x3 = Immutable(0) // immutable reference to an immutable object
var x4 = Immutable(0) // mutable reference to an  immutable object

// these all compile
x1.x = 1
x2.x = 1
x2 = Mutable(2)
x4 = Immutable(1)
// these fail to compile
x1 = Mutable(1)
x3 = Immutable(1)
x3.x = 1
x4.x = 1

Bulat-Ziganshin commented 6 years ago

@Bulat-Ziganshin there is no const-madness in Nim.

yes, and I think that this proposal will add it

Why let variables aren't deep-immutable? When you pass such variable to a function, the function should be informed that it shouldn't allow any changes to contents pointed by this parameter. This means that functions should get another specifier type, say "immutable", in addition to "var"/nothing.

Now, if you want to enforce rule let-vars should be passed only to 'immutable' param, you will need to update all Nim libraries, adding "immutable" specifier to all functions that doesn't modify the corresponding parameter. And if you return an immutable parameter back, result type should be marked as immutable too.

Moreover, some of these functions will have different implementation depending on parameter immutability (substr is the best example), so you will need to provide overloading for both cases.

But what about mutable collections of immutable things? If they are prohibited, it will seriously limit let-vars usability. Just imagine - you can't assign them to fields of mutable objects. So, we quickly going to idea that const-ness is just specifier belonging to each level of ref/ptr.

And now you can't just declare function parameter as 'immutable' - you need to go through its entire structure and mark each ptr/ref as immutable or not. Otherwise the type system will bite you, either allowing to modify thing supposed to be immutable, or opposite.

And now, instead of simple seq[string] you need to overload your function on 4 types - imm seq[imm string], seq[imm string], imm seq[string] and seq[string]. Well, you may think that it's enough to implement it for the most specific type, f.e. imm seq[imm string], and compiler will convert calls with other types to it, but it doesn't work again: recall that the whole idea of let-constness was to avoid copying. If function parameter type is imm seq[imm string], then compiler should believe that it's deep-frozen let constant, so it can freely reuse any its part. F.e. example, it can return part of string in the 'substr' call (instead of making a copy), so sorry - you really need all those 4 definitions to deal with all types of string sequences.

And we don't yet looked into generic definitions and lambdas. Overall, look into C++ textbooks. It looks like a great idea until you open this can of worms.

Araq commented 6 years ago

there is no const-madness in Nim.

There is no madness in Nim precisely because mutability in Nim is an aspect of the symbol, not of the type. It works out beautifully and with the coming move semantics ref can be used less and less often, making things deeply immutable. Alternatively or in addition to that, the "write tracking" pass (which exists!) could be enabled via {.experimental: "writetracking".}. Both solutions are much better than a type system extension.

cooldome commented 6 years ago

https://nim-lang.org/araq/writetracking.html explains it all.

timotheecour commented 6 years ago

I read through https://nim-lang.org/araq/writetracking.html ; I need to think more about it to have an informed opinion;

one thing though:

both C++'s and D's approaches to "const" leave a lot to be desired: For instance in C++ you cannot pass a vector to a vector. In my opinion this is a major wart in the type system

this is C++ specific, and doesn't apply to D, where the following works fine:

import std.stdio;

alias V1 = const(string)[];
alias V2 = string[];

void fun1(V1 a){
  //a[0] = "HELLO"; // would error (as expected)
  writeln("a:", a);
}

void fun2(V2 a){
  a[0] = "HELLO";
  writeln("a:", a);
}

void main(){
  V2 a2 = ["hello", "world"];
  V1 a1=a2; // ok
  fun1(a2); // ok
  fun2(a2); // ok
}

could you provide a good motivating example where D's approach to const, immutable is problematic ?

This means immutability leads to more generic functions and thus instantiations and the linker needs to merge all these definitions later

acknowledged, this does apply to D.

I started to use C++ before const-madness and I don't like this C++ feature

despite all of D's problems, I don't think const-madness is one of them; at least not a serious one that people complain about (see D survey).

Example, most idiomatic D and phobos uses ranges, and take range by value (example, https://dlang.org/phobos-prerelease/std_algorithm_searching.html).
String handling is both safe (no aliasing issues thanks to immutability) and efficient (thanks to slices); see an example of a module using a lot of string handling: https://dlang.org/phobos/std_path.html ; note that simpler code could be written too, eg: string foo(string bar){...} with some loss of generality (can't pass char[])
as another example (involving safe slices of data), tensor libraries that support multidimensional slices (eg mir: https://github.com/libmir/mir-algorithm) are a lot more sane in D compared to C++ (eg opencv's cv::Mat), while being as efficient as can be (see [Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen] (http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html) )

Araq commented 6 years ago

Slices to immutable data require either RC-like operations to ensure temporal memory safety or a tracing GC mechanism (or a Rust-like borrow checker, but that's more limited). Not something we should embrace too quickly.

Araq commented 5 years ago

Well, not gonna happen anytime soon, but I encourage you to spend some time with my writetracking algorithm.

nim-lang / Nim