Closed ntrel closed 1 year ago
I think it's best to not allow code like this.
For example, I'm considering not allowing to modify primitives fn foo(mut int) {
via function args. There's no reason to do so instead of returning them.
I'm considering not allowing to modify primitives
The same problem can occur with arrays of struct elements.
fn foo(mut ...)
was introduced to improve performance and remove unnecessary allocations when modifying arrays.
So I'm considering only allowing to modify arrays/maps, and for everything else to require unsafe
.
I had a benchmark, where copying a 100 byte struct was actually faster than dereferencing it. I'll publish it today.
I'm considering only allowing to modify arrays/maps, and for everything else to require unsafe.
So it wouldn't support passing arr[i] as a mut parameter.
This problem could still occur:
fn bad(arr mut [][]int, elem mut []int)
.
Your restriction would also mean generic containers can't be memory safe.
But this won't compile if https://github.com/vlang/v/issues/1131 is implemented.
There's absolutely no need to use foo(mut a, mut a[0])
instead of just foo(mut a)
.
This is bad code, and it's good to have a compiler that doesn't allow it.
this won't compile if #1131 is implemented
In that issue you said it would:
Will passing arr[i] as a mut argument still work?
yes
Functions that mutate arguments must work generally, accepting an element in an array. Otherwise you make those functions only work with arrays, and make them take an index parameter. If we don't support mut return then mut parameters of any type is essential, otherwise you make writing well encapsulated code impossible. On top of that generic code becomes much less capable and expressive.
In that issue you said it would:
I was wrong about that, sorry. Immutable reference would be fine, not mutable.
Functions that mutate arguments must work generally, accepting an element in an array. Otherwise you make those functions only work with arrays, and make them take an index parameter.
If forcing to pass an index to the function is the price to pay for compile time memory safety without GC, that sounds good to me.
A stupid question: Is it possible that V don't allow mutable variables at all? like Erlang or other functional programming languages. Less memory safety issues?
No, it's not possible :)
If forcing to pass an index to the function is the price to pay for compile time memory safety without GC, that sounds good to me.
In practice the cost will be higher. People will write functions that don't take an array, they will have to duplicate the functions to support arrays too, and duplicate again to support maps. Languages with generics are supposed to avoid writing the same function several times but with a different argument type.
Edit: And duplicate the code again for every other type of container.
Programmers are lazy, they either won't write libraries that support all argument types, or they'll support them but forget to update the code in some of the function copies.
I think the best way to solve this issue is to allow mut arr[i]
arguments, but to increment the ref count of arr.data
before passing such an argument to a mut parameter and decrement it after the function call.
&arr
and &arr[i]
should not be allowed outside an unsafe block.
I wrote the following in the forum. I copy it here because it could be more visible to interested parties.
An option could be to explore the (scant) comp-sci literature on compile-time garbage collection (CTGC).
It's usually directed towards declarative programming languages but a recent (2017) PhD thesis is about CTGC in the context of a system programming languages with imperative features (like mutability).
V is an imperative language with features (immutability, pure functions, …) that encourage a functional programming style, so it could be (I haven't read this thesis yet) a good target for CTGC.
Reference: ASAP: As Static As Possible memory management, by Raphaël L. Proust.
@MaD70 -- thanks, this is a nice reference. It brought to mind a discussion on 'owned' refs going on in the Nim community, and a related paper "Ownership You Can Count On: A Hybrid Approach to Safe Explicit Memory Management by Adam Dingle and David F. Bacon and evolving related design for destructors.
I just made substr() behave like in Java and C#, so freeing strings became much easier.
Local non escaping arrays are already being freed.
@medvednikov we definitely need to be able to pass references to structs/arrays to functions and allow them to be mutable. without it there would be so much code duplication
Mutable arguments are not going anywhere.
I'm considering not allowing mutable arguments with data pointing to the same memory block.
@medvednikov ahh good I was getting worried for a moment hehe.
I'm considering not allowing mutable arguments with data pointing to the same memory block.
Sorry I wasnt sure what this means, pointing to the same memory block as what? you mean like a reference to part of an array?
@joe-conigliaro
Like in the example above:
There's absolutely no need to use
foo(mut a, mut a[0])
instead of justfoo(mut a)
.
@dcurrie you're welcome. I'll check that references, thanks.
@joe-conigliaro the problem here is aliasing in presence of side effects (mutability). I'm not familiar with the literature on alias analysis but I imagine the solution is not a "quick hack". Perhaps instead of a plethora of edge cases it would be better to think to a good general solution. I think memory management need to be well thought out now that V is still in alpha stage.
Thanks @medvednikov, thanks @MaD70, yeah it's a tricky one, I'm sure if there was an easy solution a lot of the existing languages would work a bit differently.
I began to read the aforementioned thesis yesterday. In the section where the author discusses other memory management (MM) regimes —linear types specifically— I found this passage that made me think to a possible temporary solution (p. 15, emphasis added):
We illustrate the necessary program changes that (non-relaxed) linear types induce in Figure1.2. These changes constitute the administrative overhead programmers must go through to satisfy the type-like analysis. They include explicitly copying
data
because it is used multiple times. They also include explicitly ignoring d4 because it would otherwise go unused in theelse
branch. This administrative overhead highlights a practical penalty as well as a conceptual one: programmers are once again involved in memory management. Taking this conceptual penalty to the extreme, one can interpret theignore
function like C’sfree
and see linear types as compiler-checked, manual memory management.
That is, until a good MM regime isn't established, which will require quite of time, V could continue to expose a classic C MM API (malloc
& friends). When the new MM regime will be ready, the compiler would ignore the various calls to free
in old sources (or treat them just as hints, if that is useful).
Apropos of that, I recently stumbled upon this cross-platform drop-in replacement for malloc
from Microsoft, mimalloc, that I think it's worth investigating: they claim is faster than contenders and relatively small — 3′500 LOC.
@MaD70 yeah i've seen mimalloc, it has different performance characteristics than malloc and it's siblings
I thought a bit about @medvednikov proposal in #1131 and I think it could work with a bit of syntactic sugar. For example, having:
fn foo(x int) int {
…
}
What about this:
fn main() {
mut n := [1,2,3]
n[i].foo
}
being syntactic sugar for what follows?
fn main() {
mut n := [1,2,3]
n[i] := foo(n[i])
}
That is, the function will be treated like a sort of method when called onto an array element. Update: this syntactic sugar could be adopted for all types, making obsolete the distinction function/method; see Uniform Function Call Syntax (UFCS).
It doesn't need to be a single parameter function; the requirements I can think of for eligible functions are:
T
of the array elements, andn[i].foo(…)
, in a expression, as foo(n[i] …)
b. mutable: translates n[i].foo!(…)
, in a statement, as n[i] := foo(n[i] …)
note the !
appended to the function name to make obvious at first sight that it's mutating the first parameter.T
of the array elements.For example, given
fn foo(x T y A z B …) T {
…
}
This:
fn main() {
mut []T n := … // here an expression for an array of type T
n[i].foo(a b …) // a b … are of the right types
}
could be (conceptually) rewritten into this by the compiler:
fn main() {
mut []T n := …
n[i] := foo(n[i] a b …)
}
One can passes an element of the same array as another parameter to the function but the rule in #1131 (no mutable references to array elements) avoids mutation via an alias (no mutable alias exists1) and the rule could be extended for other collection types ("containers").
Note that the assumption here is that each array element is not very large by themselves, as advocated by data-oriented design (in short: the memory layout of your data makes a difference in performance on contemporary computer system architectures, sometimes a dramatic difference); see also this presentation by Mike Acton, Data-Oriented Design and C++: video, slides and a summary in this lecture CSC 476 | Lecture 9: Data-Oriented Design.
What problems can you people spot in this proposal?
There's absolutely no need to use foo(mut a, mut a[0]) instead of just foo(mut a).
Disallowing that call in theory is good, so long as it can't happen through aliasing that goes undetected.
But V should not restrict foo(mut a, mut b[0])
, because of #1309.
if there was an easy solution a lot of the existing languages would work a bit differently
It's indeed a hard problem to solve. Swift has mutable arguments and is memory safe: https://medium.com/@lucianoalmeida1/exploring-memory-safety-and-exclusive-access-in-swift-a8cd686b3288
@medvednikov For code which accepts data that may come from stack, global data or heap, how do you go about differentiating the two in order to free memory? Example:
import strings
fn test(a string) string {
println(a)
return a
}
fn main() {
a := strings.repeat(69, 5)
test("example")
test(a) // `a` now leaks memory
}
I wrote the following in the forum. I copy it here because it could be more visible to interested parties.
An option could be to explore the (scant) comp-sci literature on compile-time garbage collection (CTGC). It's usually directed towards declarative programming languages but a recent (2017) PhD thesis is about CTGC in the context of a system programming languages with imperative features (like mutability). V is an imperative language with features (immutability, pure functions, …) that encourage a functional programming style, so it could be (I haven't read this thesis yet) a good target for CTGC. Reference: ASAP: As Static As Possible memory management, by Raphaël L. Proust.
If you're still interested in ASAP, I've been studying it heavily for the past year.
I've managed to instantiate a fragment of it in a minimal subset of Rust: https://github.com/doctorn/micro-mitten
Thanks for your notice @doctorn. I took a (very brief) look at your thesis and it looks very interesting. I'm currently occupied with some mundane and some unpleasant things. Anyway, your work could motivate me to learn a bit of Rust. Thanks again.
Looks interesting @doctorn
I'm working on memory right now (just fixed array and string leaks), will research the topics you provided.
@doctorn I am (slowly) skimming your thesis.
First I read "Conclusion", then "Evaluation", to see if in the performance evaluation you took the measurement bias into consideration (see Producing Wrong Data Without Doing Anything Obviously Wrong!)
Immediately the disappointing execution time caught my attention (p. 37, fig. 4.2, ASAP (Proust) vs control strategy and Boehm-Demers-Weiser GC). Then, regarding memory utilization, at p. 38 of the thesis you wrote:
4.1.2 Heap Usage […] In the cases of
list_len.mmtn
anddepth_first.mmtn
, ASAP is able to outperform Boehm-Demers-Weiser at all problem sizes, while outperforming the control strategy at larger problem sizes. As my ASAP implementation makes use of the Rust runtime and Boehm-Demers-Weiser manages its own internal heap, both pre-emptively allocate pools of memory usingmmap()
. In these examples, this pre-allocated memory was never exceeded and hence there was no observed growth in the heap.
Do you know the details of Rust runtime memory manager? Can be it at least part of the execution time bottleneck?
It is well known in video games programming circles (I'm not a game programmer, I'm just curious) that standard general-purpose memory managers in languages runtimes are slow; quoting from pp. 426—427 of Game Engine Architecture, 3rd ed:
6.2.1 Optimizing Dynamic Memory Allocation
Dynamic memory allocation via
malloc()
andfree()
or C++’s globalnew
anddelete
operators—also known as heap allocation—is typically very slow. The high cost can be attributed to two main factors. First, a heap allocator is a general-purpose facility, so it must be written to handle any allocation size, from one byte to one gigabyte. This requires a lot of management overhead, making themalloc()
andfree()
functions inherently slow. Second, on most operating systems a call tomalloc()
orfree()
must first context-switch from user mode into kernel mode, process the request and then context-switch back to the program. These context switches can be extraordinarily expensive.
The second factor (apparently) isn't relevant in your tests, but could the first factor be? I'm asking because I don't know anything about Rust.
@MaD70 Thanks for the reference!
Rust’s runtime is minimal and I only use Rust’s allocator in the language runtime to buffer deallocation requests. Allocations in the language itself are handled using malloc
/free
.
@doctorn just to be sure: with "Allocations in the language itself are handled using malloc
/free
" do you mean that the compiler of µ-Mitten inserts a lot of malloc
/free
in the generated code, right?
If this is the case, it is highly likely that all those malloc
/free
are responsible for at least part of the performance bottleneck. Do you think that is it much work to test my hypothesis, that is substituting malloc()
and free()
from standard run-time with a custom memory manager?
@MaD70 the compiler doesn’t emit any calls to free
, it instead emits calls to a runtime function called mitten_free
which aggregates frees. After frees have been aggregated the compiler emits a call to mitten_reset
which drains the set of pointers and frees each one. I don’t think it would be any work at all to swap out the allocator though.
@doctorn thanks for clarification. I see, in the end there are a lot of calls to libc
's free
, even if concentrated in one go.
I'm curious to see what will happen with a custom allocator. Once the performance will be decent (at least on par with a good GC, but without random long interruptions—hopefully better) this memory management technique will become very interesting even for V.
I want to point out another aspect of general purpose memory allocators that it could be of interest: they usually allocate much more memory than that required to allocate a primitive data type (for memory alignment purpose, useful also to catch off-by-one errors). So they are wasteful in allocating a lot of small memory blocks.
Citing, for example, from GNU libc
documentation:
The address of a block returned by
malloc
orrealloc
in GNU systems is always a multiple of eight (or sixteen on 64-bit systems).
For a more extensive overview see the comments in the source code of GNU libc
's. As one can read and contrary to common expectations, even malloc(0)
is wasteful:
Even a request for zero bytes (i.e.,
malloc(0)
) returns a pointer to something of the minimum allocatable size.
In the vein of CTGC and ASAP: https://aardappel.github.io/lobster/memory_management.html
I noted someone on Reddit saying that V would use Lobster's technique. Don't know that's true, but I went and looked it up, posting it here for reference.
Thanks @tooolbox. I periodically check Wouter van Oortmerssen's site by years but I missed this page on Lobster’s memory management (perhaps is relatively recent).
van Oortmerssen has a notable record of programming languages design and implementation besides being a game programmer. He puts a good blend of theory, experimentation and practical experience in his work, so it's always interesting to read his take on a subject.
For reference: this phenomenon of interfering side effects was studied (and a solution was proposed in the form of syntactic restrictions) at least since 1978 by John C. Reynolds, see his Syntactic control of interference (free download from ACM at least until June, 30). Perhaps looking for ideas in related literature could be fruitful.
regarding lobster memory management:
both points are direct consequence of lobster's choice for memory management, ie solving them would mean moving away from lobster's strategy as far as I undersand
hmm perhaps also worthwhile to add that lobster doesn't handle reference loops well, which cause a memory leak in it. Lobster does provide means to identify those loops with debug output after the program is run, so it's not bad overall. However V aims for much more strict approach of "if it compiles it's memory-leak free"
Btw. it seems highly optimized reference counting (optimized in a sense that vast majority of refcounting is eliminated by static analysis in compile time) with runtime loop elimination (and compile time warning about potential loops) is becoming a thing in Nim.
They're recently begin finishing their several years long endeavor of the ORC "garbage collector" (which is nothing else than what V already partially does - namely inserting free()
calls at proper places - but with seamless transition to runtime garbage collection for cyclic cases) to make it the default in Nim 2.0.
@UweKrueger talked about the need for "reference counting" (i.e. garbage collection) e.g. in https://github.com/vlang/v/pull/10219 and ORC-inspired algorithm (especially the optimizations) could cut it.
However V aims for much more strict approach of "if it compiles it's memory-leak free"
uh, stark claim :-D. any idea how to get there?
I think full memory safety without GC, yet with
mut
function parameters is a hard problem to solve. Suppose we have reference counted array data:Edit: this can be solved by incrementing
a.data
's ref count for the duration of the call.Even if the compiler somehow forbids this case by detecting
a
anda[0]
are both passed mutably and refer to the same memory, there are other ways this can happen:Or this:
Edit: This can be prevented by disallowing
&a
and&s
outside anunsafe
block.(I know V is against GC, but there's only one real downside to e.g. Go's GC - it uses a lot more memory than is necessary. GC can often be faster than manual memory management, particularly a generational GC. Also note that maintaining a reference count itself is somewhat slow, plus RC can cause big pauses when a RC'd object's ref count goes to zero and that object owns a chain of other RC'd objects. This can happen with a big array of arrays where each array element has a ref count of 1).