keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

new wd-40 test #51

Open NodixBlockchain opened 3 years ago

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352257134
Original Date: Dec 17, 2017, 7:50 AM CST


  • fast allocation (just decrement a pointer)

I explained in my prior post that my proposal is the same. My reading is that is what "bump pointer" allocation means in the literature.

  • fast deallocation (just increment the pointer)

I explained in my prior post that essentially the same performance in the case that the objects die young, because they rarely traced (amortized) nor copy compacted. The key difference is that with stack deallocation, the timing of the deallocation is more immediate and deterministic. However, if such immediacy is desired in my proposal (e.g. if there's a destructor), then use the RC reference (aka pointer) type.

  • no indirection (we can hold pointers directly into the stack)

I already explained the same can be done in my proposal. But there are many cases even in stack allocation where you're going to be passing-by-reference to a function/procedure, thus you'll still have indirection on the stack allocation.

  • no pausing (no mark/sweep phase)

There's no pause on the tracing on the nursery because most objects die young. My proposal replaces mark-sweep (MS) for the older objects with reference counting (RC). The programmer dictates which objects live long and thus marks them with a RC type, so that no runtime write-barrier is needed on assigning references (aka pointers) because the types involved in reference assignments are known at compile-time.

  • and zero fragmentation

The tracing, copy compacting nursery doesn't generate any fragmentation either.

Thus I repeat:

I'm failing to find any significant justification for supporting the tsuris of static (i.e. compile-time) stack allocation and lifetimes. I think this proposal of mine will defeat both Rust and Go as the next mainstream, general purpose programming language.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352258187
Original Date: Dec 17, 2017, 8:08 AM CST


The tracing, copy compacting nursery doesn't generate any fragmentation either.

It generates fragmentation, then fixes it by copying. The cost is much higher. Copying is slow compared to just incrementing/decrementing a pointer. You have to double indirect for the heap because the data can move, and you have to pause when moving data between the nursery and other spaces, because you cannot be allowed to access the data mid copy.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352258898
Original Date: Dec 17, 2017, 8:19 AM CST


It generates fragmentation, then fixes it by copying. The cost is much higher. Copying is slow compared to just incrementing/decrementing a pointer.

Incorrect. I already stated that rarely are any of the objects still around when the tracing+copy compacting occurs. The cost is amortized away to negligible. I think this is the 3rd time I have stated this in some of the recent prior posts.

You have to double indirect for the heap because the data can move

Incorrect. My understanding from reading that book about GS, is the pointers are updated when they're traced. And I repeat, this rarely happens for objects in the nursery, so the cost is negligible.

you have to pause when moving data between the nursery and other spaces

Incorrect. Seems you're totally ignoring my proposal even though I have reexplained it more than once. In my proposal, no objects ever get moved from the nursery1 (without the programmer indicating at compile-time for which instances he wants it to happen), and instead the programmer marks which objects are long-lived with a RC type.

because you cannot be allowed to access the data mid copy.

Any copying will induce a pause, but for that thread only because the nursery will be per thread (whereas the RC objects will have multithreaded access), but since the amount of copying in the nursery will be small, you're looking at microsecond pauses, which is irrelevant (except maybe in some very obscure embedded programming use cases where hand-tuned memory management is required). In my proposal, the programmer will be entirely in control of whether he is overloading the nursery will longer-lived objects which he should have given a RC type instead. The major pauses in MS are usually reprocessing the same long-lived objects and/or overloading the rate of generation of (too long lived or excessively large) objects which escape the nursery. Afaics, my proposal mitigates those pathological cases.

Seems you have not yet wrapped your mind around my proposal and what is different about it.

1 Except when the programmer assigns a nursery object to a reference in a RC object, but the compiler can display which these are because all RC object instances are indicated at compile-time, so the programmer is in control of the cost.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352262688
Original Date: Dec 17, 2017, 9:14 AM CST


Even short lived objects in the nursery will have different lifetimes, so it will fragment. Before dealing with the other points, explain why there is no fragmentation?

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352263665
Original Date: Dec 17, 2017, 9:29 AM CST


There is one clear situation.where having no stack is better, and that is in an asynchronous language where you want a 'cactus' stack.

In these circumstances, allocating a variable frame for a function would be a heap allocation, with a pointer to the previous frame. There will be fragmentation still.

Nurseries work to prevent fragmentation because they get defragmented when copying the long lived objects out. If you mark objects as long-lived, it does not mean that you do not need to defragment the nursery.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352297619
Original Date: Dec 17, 2017, 6:12 PM CST


Nurseries work to prevent fragmentation because they get defragmented when copying the long lived objects out. If you mark objects as long-lived, it does not mean that you do not need to defragment the nursery.

There's negligible defragmentation of objects that die young because they rarely get traced and copy compacted as I already mentioned. And there's no long-lived objects in the nursery in my proposal.

In my proposal, the entire point is RC typed object instances never go into the nursery (and that's why no runtime write-barrier is required, which I've mentioned before). They're initially allocated on the long-lived heap, which employs the methodology I mentioned upthread to mitigate fragmentation of the long-lived heap. So the long-lived heap will have some fragmentation of the virtual address space (but not more than negligible fragmentation of physical memory if designed correctly per my upthread points), which is why I mentioned I'm prioritizing 64-bit virtual address space (but I clarified this doesn't mean I'm advocating allowing memory leaks). And afaics, stack allocation would also not optimize these long-lived objects...

Comparing apples-to-apples, stack deallocation should not typically pertain to long-lived objects (at least afaics that's not the use case we're trying to optimize with stack allocation) because for example if I tried to put the entire UTXO (unspent transaction outputs) of a blockchain on the stack, the stack would overflow (as well I think the lifetime analysis would be unwieldy if not also implausible). Thus if we're correctly comparing stack allocation/deallocation to nursery allocation/deallocation, then the latter should also have no long-lived objects in my proposal (unless the programmer is derelict w.r.t. declaring RC types) and thus no fragmentation either. My proposal is not as automatic as BG-RC, because the programmer has to reason correctly about the lifespan of the instances of the objects he creates. But afaics, the performance benefits of this additional control provided for by my proposal outweigh the additional cognitive load on the programmer— which I contemplate is afaics significantly less tsuris/annotations/complexity than Rust's lifetime+exclusive mutability and the programmer is as you pointed out otherwise fighting with a fully automated BG-MS in pathological cases which he otherwise can't have the GC exclude from the workload of the MS collector.


There is one clear situation.where having no stack is better, and that is in an asynchronous language where you want a 'cactus' stack.

In these circumstances, allocating a variable frame for a function would be a heap allocation, with a pointer to the previous frame. There will be fragmentation still.

You're referring to for example how JavaScript Promise unwind the stack back to the event loop and the state of the closure for the callback of the .then() is stored on the heap for execution when the event loop triggers it. So having the stack frame on the heap by default is more efficient, than copying from the stack to the heap for said closures. I guess 'cactus' stack means the call stack is stored separately from the stack frames with the latter on the heap.

But note I had proposed that for some types of non-UI code, multiple instances of the code could be running each on a different thread (from a thread pool) and their stack gets entirely stalled (a green thread concept) when the code is blocked. Thus there's no stack unwinding on blocking and this also provides exclusive mutability for all nursery objects (which are inaccessible from other threads). This proposal provides asynchronicity via running other instances of the said code while any instances are blocked. For example, this would apply to the code that processes a transaction for a blockchain wherein multiple transactions could be processed in simultaneously. Note that afaics, one of the key problems with Node.js+JavaScript (and apparently Go) is there's no way for these separate instances to access a common heap which would store the UTXO. The alternative of passing messages to a thread which manages the UTXO heap would be much slower because it would funnel what could have been parallelized access into a single-threaded queue. In my proposal, the RC instances are on a shared heap, so then immutability has to be employed (or some other means of assuring mutability safety). I contemplated that I want to make the UTXO records read-only but the tree/graph is mutable, which provides the necessary mutability safety.

Note I also contemplated that such a non-unwinding (of stack) proposal could also allow for blocking on multiple simultaneous asynchronous tasks which are spawned from the same stack frame, as long as each of those tasks has exclusive mutability access for any objects that can be mutated during the runtime of the task.

For UI code, although not required by my proposal, copying from stack to closures on the heap is a negligible cost. Also I think where plausible UI should be more optimally coded to use a stateless, no-callback design wherein asynchronous tasks (e.g. send a cryptocurrency transaction) are run in a separate thread (on a thread pool) and update the main UI thread with messaging when they're done (i.e. Go's CSP channels model). In the cases where a series of sequential, blocking tasks that depend on shared state, this could employ the aforementioned stack blocking rather than employing Promise closures on the heap and unwinding the stack. IOW, separating the concerns so that the blocking code isn't in the UI thread. This is also about structurally handling mutability safety without needing Rust's exclusive mutability borrowing tsuris. In short, I think Eric Raymond is correct that Go is closer to the next mainstream programming language than Rust is. Yet, I am proposing what I think can be better than Go. And also I know @keean is interested to get more complete HLL typing (e.g. typeclasses) which Go doesn't have.

We have the chance to make Lucid better than both Rust and Go, by studying what each of them did wrong and right. Of course, it would be most beneficial to have @keean's collaboration given his extensive experience with programming language design research and actual usage. Most specifically his expertise with typing systems, per for example his co-authoring of the OOHaskell research with the venerable Oleg.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352343802
Original Date: Dec 18, 2017, 1:14 AM CST


There is no fragmentation with a stack because there is a total order of object lifetimes. Any object appended to a stack has a lifetime less than or equal to the lifetime of the current top-of-stack object. This is why they don't fragment.

The same is not true of a heap, even one with a nursery. As soon as you place any object on the heap that has a shorter lifetime than the next object you will get fragmentation. Consider:

f => 
   let x = 1
   let y = 2
   let z = 3
   let w = x + y
   // GC runs now
   return w - z

As the GC is asynchronous, it can run at any time. If we consider GC running where indicated, we allocate x, y, z and w on the stack, but we have no references to x or y any more, but we need z and w, so after GC, x and y will be freed. If we imagine the heap as a simple array we have.

[...]
[x, y, z, w, ...]
[(), (), z, w, ...]

You can see the fragmentation starting as we have two empty slots separated from the 'empty space'. If we consider objects may be different sizes, this would make it worse.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352473855
Original Date: Dec 18, 2017, 10:12 AM CST


@keean I already explained why there will be no more than negligible performance cost of fragmentation for short-lived objects in the nursery of my proposed design (and I already explained why so just reread the prior posts), which essentially equivalent to the lack of fragmentation for stack allocation (which I stated is most applicable to short-lived objects).

Your response belies an understanding of what I have written. Obviously I know about fragmentation in the long-lived heap if it's not copy compacted (because upthread I even presented the ways to minimize such fragmentation), but that's irrelevant because we're comparing apples-to-apples meaning stack allocation for short-lived objects and the nursery for short-lived objects. Long-lived objects won't be put on the stack any way; thus you'll get fragmentation of long-lived objects even if having stack allocation for the short-lived objects.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352497392
Original Date: Dec 18, 2017, 11:31 AM CST


@shelby3 I think you are misunderstanding, if your heap has zero fragmentation, then it must be a stack. A stack is a name for a heap where there is no fragmentation. There is simply no other way to manage a block of memory and have no fragmentation. Its either a stack, or it fragments. As fragmentation accumulates over-time a heap needs de-fragmentation whereas a stack does not. These are facts, and not opinion.

Edit: Generational garbage collectors can defragment the nursery when copying the long-lived objects into the next-generation region.

Some advantages of a stackless design: no arbitrary limit on stack size (no stack overflow, just out-of-memory), can run many more threads as each one does not need to reserve a stack. However not having a stack means having an assembly language runtime on every platform. It would be much easier to get portability of the languages compile target was 'C'.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353015132
Original Date: Dec 20, 2017, 3:44 AM CST


@keean, generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect in the sense that defragmentation implies a cost. Yet I'm repeating myself.

Some advantages of a stackless design: no arbitrary limit on stack size (no stack overflow, just out-of-memory), can run many more threads as each one does not need to reserve a stack. However not having a stack means having an assembly language runtime on every platform. It would be much easier to get portability of the languages compile target was 'C'.

I specifically wrote upthread that a stack would still be employed for the call stack.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353018068
Original Date: Dec 20, 2017, 3:55 AM CST


@shelby3 wrote:

generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect. Yet I'm repeating myself.

This discusses nursery fragmentation in a generational garbage collector:

http://www.mono-project.com/docs/advanced/garbage-collector/sgen/

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353019681
Original Date: Dec 20, 2017, 4:01 AM CST


You could eliminate fragmentation in the nursery by not allowing pinned objects, and completely cleaning out the nursery on a GC run. This has the disadvantage of copying some short lived objects into the second generation that should not be there, which will increase the fragmentation of the second generation, as these objects would be discarded soon after, it also wastes time copying objects that are likely to be discarded soon.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353060199
Original Date: Dec 20, 2017, 7:16 AM CST


@keean if you refuse to understand my proposal, what's the point of repeating myself 5 more times. Please review my prior posts.

There's no 2nd generation in my proposal. Apparently you keep forgetting (or refusing to digest?) salient facts of my proposal that afaics I've already explained numerous times. I think we should take this discussion private into LinkedIn for a while. The discussion is getting repetitive and noisy due apparently to lack of effort applied to comprehension. And it will digress into my frustration with your lack of comprehension of that I already wrote. I think I've explained patiently enough already. You're repeating yourself numerous times and still apparently (as best I can surmise) not grokking what I had already written.

You could eliminate fragmentation in the nursery by not allowing pinned objects

How many times have I already stated that RC typed objects will not be placed in the nursery ever.

and completely cleaning out the nursery on a GC run.

How many times do I have to repeat that the nursery will be periodically copy compacted and nothing will ever be promoted out of the nursery. The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means. I've repeated several times in my prior explanations.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353071030
Original Date: Dec 20, 2017, 8:03 AM CST


How many times do I have to repeat that the nursery will be periodically copy compacted and nothing will ever be promoted out of the nursery. The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means. I've repeated several times in my prior explanations.

Which means you are agreeing with me that the nursery will suffer from fragmentation, and that you are periodically de-fragmenting it (which is what a copy compactor does). It appears we have a differing opinion about the cost of this. Its hard to be specific about the cost without implementing and benchmarking.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353153778
Original Date: Dec 20, 2017, 1:05 PM CST


@keean wrote:

Which means you are agreeing with me that the nursery will suffer from fragmentation, and that you are periodically de-fragmenting it (which is what a copy compactor does).

This is not a political contest. This is engineering. I already explained that you're incorrect. I will quote again (this is very frustrating):

@shelby3 wrote:

@keean, generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect in the sense that defragmentation implies a cost. Yet I'm repeating myself.

For a copy collector to be performing non-negligible defragmentation then there must be a non-negligible cost of copy collecting, otherwise the claim of fragmentation is vacuous:

The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means.


@keean wrote:

It appears we have a differing opinion about the cost of this. Its hard to be specific about the cost without implementing and benchmarking.

If you wanted to challenge my claim that the cost of the copy collector is non-negligible, you could have done that very far upthread when I first mentioned it and employed the term 'amortized'. That would have relieved of us of a lot of noise such as you implying I do not know what fragmentation is (geez). The apparent fact is you weren't paying attention otherwise you could have gone straight to the point of challenging the specific claim.

It's inarguable that the copy collector's added cost will be negligible, because the longer-lived objects of the nursery will get stuck at the compacted end and won't need to be repeatedly compacted. Thus the only ones that can get defragmented are the short-lived ones, but due to the design criteria that the copy collector's period will be much longer than the average life of the short-lived objects, then the cost must be negligible. I had already stated upthread that my proposal hinges on the programmer correctly employing RC type references for longer-lived objects so as to not load the copy collector with redundant tracing.

The period of the copy collector is only limited by the amount of virtual address space (and physical memory if the programmer is overloading with long-lived objects) that can dedicated to the nursery (of which some percentage will be logically discarded but still allocated at any given moment in time thus being essentially unavailable virtual address space but this shouldn't matter in 64-bit).1 And limited by the throughput of the copy collector which might be significantly less due to the additional repeated tracing overhead, if too many long-lived objects are allowed into the nursery by the programmer.

I think the potentially plausible retort (which I thought of since I introduced my proposal), is to argue that the programmer will not be able to correctly tag the longer-lived objects with the RC type. I have contemplated use cases wherein I think the programmer can, but that may or may not be generally the case.

1 This statement depends on paging out to disk being a cost-free operation. In terms of performance, if there's DMA on the bus so that paging out to disk runs in parallel to the CPU, thus doesn't incur any performance cost. However, there's a power efficiency cost. And that presumes that logically discarded objects don't occupy the same memory page as objects which are still being accessed. So I guess it's more safe to presume physical memory as limiting factor on length of the period being traced copy compactions of the nursery.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353156939
Original Date: Dec 20, 2017, 1:17 PM CST


most allocated objects are already deallocated

This seems wrong, if the rate of object allocation is the same as the rate of object deallocation there will always be plenty of objects in the nursery.

The idea that a bunch of objects get allocated then deallocated cleanly without overlapping other allocations is wrong. There will be allocations every time a function is entered, and the scope of the calling function will still be current.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353168180
Original Date: Dec 20, 2017, 2:03 PM CST


Its probably worth pointing out that we are discussing details, I think we are in agreement about the big-picture. I am not sure I want to have syntax for indicating long lived objects, I think implementing an efficient state of the art generational GC could be good enough.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353292466
Original Date: Dec 21, 2017, 2:45 AM CST


@shelby3 This is interesting, and matches my experience:

http://wiki.c2.com/?MemoryLeakUsingGarbageCollection

A couple of key quotes:

As time goes, I consider more and more the GarbageCollector as something that turns freed pointer crash into memory leaks. And yes, I'm hunting Java memory leaks more frequently than C/C++ memory leaks, but I'm hunting C/C++ crashes due to freed references much more than NullPointerExceptions in Java.

But the problem you describe certainly has the same effect, so I guess I'll have to widen my definition slightly to "failure to free memory that will no longer be referenced by the running program". It's certainly true that, in the presence of globals or long-lived locals, the explicit act of freeing memory has to be replaced by the explicit act of nulling a reference.

So is GC actually a solution? With manual memory management we have to free references to heap storage (but not all references, because of RAII). With GC we have to null references that are no longer part of the running-set of the program (but not all references, because of simple local cases). Both of these lead to bugs, its just that with GC you might get away with it because the program is too short lived to run out of memory, but the delay makes it much harder to track down and debug.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353293981
Original Date: Dec 21, 2017, 2:52 AM CST


Reference counting with immutable objects is an interesting point in the design space, as you cannot create circular references with immutable objects, so reference counting becomes a complete solution.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353297593
Original Date: Dec 21, 2017, 3:08 AM CST


The Python GC is interesting:

http://arctrix.com/nas/python/gc/

They basically sacrificed performance for portability.Of the widely available GCs JavaScript is not that bad, it is more the APIs provided by JS do not allow memory re-use, which leads to fragmentation and worse cache performance.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353344339
Original Date: Dec 21, 2017, 6:57 AM CST


NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353346066
Original Date: Dec 21, 2017, 7:06 AM CST


In all methods we have to do something with references, as with GC we must still null references to avoid leaking memory, without GC we have to free them. RC has the same reference problem as MS, but it also has the cycle-memory leak problem.

An example may make this clearer. Imagine I have a ring-buffer, which contains references to other data. I can 'clear' the ring-buffer efficiently by setting "start = end". With manual memory management, I leak all the data referred to in the buffer because I did not free the memory pointed to. With GC (RC or MS) I leak the memory because the references still exist in the ring buffer memory which is still in scope, even though I have cleared it by setting (start = end). The reference count in the references is not decremented by moving the ring-buffer start or end pointer, and the Sweeper can still find the references in the cleared ring buffer to follow them and mark the memory as still alive.

It doesn't matter whether there is GC or not, we have to know that when implementing the 'clear' method of the ring-buffer we must deal with making safe every reference in the buffer when moving the start or end pointer. So in both cases we need to understand the underlying memory and write code to safely handle references or we will leak memory.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353348959
Original Date: Dec 21, 2017, 7:20 AM CST


NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353357587
Original Date: Dec 21, 2017, 8:00 AM CST


It still has the same problem. Consider again the example of a ring-buffer. I need a ring-buffer for the algorithm I want to implement. If the language only provides reference lists, then I base my implementation on reference lists. I have a start and end index into the list to represent the read and write pointers of the circular-buffer. I try and implement a fast-clear by setting the read-index equal to the write-index, and again I have just leaked all the objects that are referenced by the circular-buffer.

Remember container-objects themselves (like a circular-buffer) can be long-lived. For file IO, or network IO the buffers may last the lifetime of the program. When you get down to it managing objects in a collection becomes managing memory. Just because an array, list, map, or set does not obviously have any pointers or references, you can still leak memory by forgetting to delete objects from the container. In the end memory is just an array and a pointer is just an array-index.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353363372
Original Date: Dec 21, 2017, 8:24 AM CST


NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353369275
Original Date: Dec 21, 2017, 8:49 AM CST


In general we can call this 'region-based-memory-management' or the Arena pattern as Rust calls it. We can have special thing called 'regions' and a region is the only thing that can store a pointer/reference. We allocate regions on the stack using RAII, so we are guaranteed that all memory associated with the region is deallocated when that stack frame returns. In this way we cannot leak any memory outside the the function declaring the region. Hovever this leads to two things for programs that have no definite lifetime (interactive programs), either the function never returns (so we leak the memory anyway), or we have to handle returning references to a region by promoting the reference to a region higher up the call-stack (which lets that reference escape, hence potentially leaking the memory).

There are three approaches to this, the Ada approach, which strictly does not let a reference to any memory leave the scope in which the memory referred to is declared. The Rust approach which attaches lifetimes to references, and extends the lifetime when returning a reference, to the outer scope. And finally the technique I like as a compromise between the two, which is more flexible than Ada, but simpler than Rust with no need for lifetimes, and that is when we return a reference we promote the memory to the outer region, and pass the reference into the function instead of returning it (this transformation can be done automatically by the compiler).

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353370131
Original Date: Dec 21, 2017, 8:52 AM CST


I think the real solution to this sort of things boil down to have two level of reference like the algorithm of differential reference counting i posted before, with an outer counter and inner counter, or keeping track of two level of references.

The ambiguty is think come from the fact that the memory manager or compiler can't know in a obvious manner if a pointer to an element inside of the list mean that the whole list need to be kept active, or only the element pointed to by the access pointer. Or the amount of element in the list that need to be considered as active based only on a 'current pointer' access inside of the list.

With the two level of reference approach, it make it more obvious if a reference to the list or buffer itself is kept, it mean all objects in the list are kept active, and the second level allow to keep object in memory if there is a 'weaker pointer' to an element of the list. And differentiating explicitly between the pointer to the list, and pointer to elements inside of it make it more obvious when there is still active reference to an object or not.

If there is not direct reference to the list itself, when the pointer to an element of the list is incremented, all object behind it doesn't contain any active references and can be freed. And it could recognize that the list itself cannot be freed if there is an active reference to at least one element inside of it. So a reference to an element inside of a list would hide also a reference to the list itself but it would still need to allow objects inside of the list to be freed.

@

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353373116
Original Date: Dec 21, 2017, 9:04 AM CST


I dont think thats the problem. I think the problem is when you have any collection of objects you need to manage that collection yourself. For example you have an array of employee records, you have to delete the records when you have finished with them. Because the employee records are in an array, the garbage collector does not help you manage them. The GC only knows if the whole array is referenced or not. Real applications rarely have simple flat data-structures, consider a CRM database.

So the problem comes when I have long-lived data structures, where the contents of the structure(s) have a shorter lifetime than the structure itself.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353378071
Original Date: Dec 21, 2017, 9:24 AM CST


I would like if there are 'context free' solution, because solutions based on scope can be tricky in mutl thread environment, as there can be different active scope in the same time, and it's not easy to track all the active scopes without a 'stop the world' solution to scan all threads.

An approach based on scope can probably solve a lot of problem though. But i would like better a solution that can make it obvious when reference are active without being solely based on scope analysis. Especially if there can be closures or such, it can probably become tricky to deal with.

But maybe the complexity it involve is too high compared to the benefit it give, and it's better to have special handling for multi thread sharing considering it as an escape. Or making it obvious when references can be shared between different thread's scope.

I guess with the simplest form of flat reccord like that, a 'pop' operation removing the item from the array could do it, but for more complex hierarchy, maybe not that much. The incovenient of pop is also if elements are only processed if they fit a certain condition, need to push them back afterward.

Maybe a solution could be to have some sort of functional style algorithm able to specify which objects in the collection are going to be used, or specify more complex pattern of use to the GC. If the elements are accessed in a loop, it could be based on the loop state or such. Or maybe pattern of use can be established like FIFO or ring buffer, or other to help the GC manage the life of objects based on program progression. Not sure it's really feasible though.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353489902
Original Date: Dec 21, 2017, 6:13 PM CST


Deep down i think the whole issue of automatic memory management is an issue of coupling between code as access pattern and memory management as knowing when memory can be reclaimed.

What make manual memory management ( when done properly) efficient is that it keep the memory use highly coupled with access pattern.

All common form of automatic GC tend to decouple entierely memory management from code and access pattern definition.

The rational behind object nursery remind me of using some monte carlo to compute radiosity based on a sample of light path to determine statistically the amount of light that will get in certain area without having to compute all possible light path. In the sense its based on empirical study of statstics on objects life time in most common code pattern, but still keep low coupling with code logic and access pattern.

Scope analysis keep the coupling with code higher, as its logic is coupled with code progression, but it doesnt solve everything. Especially if memory can be reclaimed inside a scope, like list processing, unless having code structured in recursion or something to have high coupling between scope and operation that can lead to memory reclaiming. Or it would need to organize code is such maner that scope can be used to capture only remaining live elements of a list or graph. Like some code that would look like recursion except it would scratch the stack frame at each iteration, and make the the next call with only the next elements left in the scope.

But still to me the best solution come with higher coupling between code as access pattern and memory management, to help the GC identify live objects not only as reachable, but also as possibly used or not, which in case of deterministic access should not be extremely difficult.

Even for graph, im sure access pattern could be made obvious to the GC. One system that come to my mind when thinking about declaring graph access pattern is something like xslt, even if its maybe not useable for that purpose as such.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353704019
Original Date: Dec 22, 2017, 9:20 PM CST


This seems wrong, if the rate of object allocation is the same as the rate of object deallocation there will always be plenty of objects in the nursery.

Math. If N allocations/deallocations of total size m per each of the N in the collection period, then the cost is m instead of N * m. That's what I mean by amortized. The cost of (N - 1) * m is never defragmented.

Granted it requires more memory than a tightly/meticulously coded stack allocation paradigm. So Rust or something like it, will still be useful for mission critical embedded applications and such. But IMO meticulously coded (inflexible, significant tsuris) stack allocation not an optimum tradeoff for a general purpose programming language. And Go is lacking in other ways (some issues linked to upthread). I'm contemplating that I can improve significantly on both of them (and Java) with Lucid. Hope you get on board the train.

I am not sure I want to have syntax for indicating long lived objects, I think implementing an efficient state of the art generational GC could be good enough.

Then you'll have the reduced performance of the write-barrier (thus invalidating my math argument above) and the worst asymptotic complexity of mark-sweep.

Seems even after all this time, you're refusing to wrap your mind around my proposal. It's quite perplexing to me why you couldn't conceptualize what I wrote about it from the start upthread. I mentioned the write-barrier issue.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354127388
Original Date: Dec 27, 2017, 9:12 AM CST


@keean wrote:

A couple of key quotes:

As time goes, I consider more and more the GarbageCollector as something that turns freed pointer crash into memory leaks. And yes, I'm hunting Java memory leaks more frequently than C/C++ memory leaks, but I'm hunting C/C++ crashes due to freed references much more than NullPointerExceptions in Java.

But the problem you describe certainly has the same effect, so I guess I'll have to widen my definition slightly to "failure to free memory that will no longer be referenced by the running program". It's certainly true that, in the presence of globals or long-lived locals, the explicit act of freeing memory has to be replaced by the explicit act of nulling a reference.

So is GC actually a solution? With manual memory management we have to free references to heap storage (but not all references, because of RAII). With GC we have to null references that are no longer part of the running-set of the program (but not all references, because of simple local cases). Both of these lead to bugs, its just that with GC you might get away with it because the program is too short lived to run out of memory, but the delay makes it much harder to track down and debug.

Most references don't need to be nulled because they (local stack variables) go out of scope or they get replaced by the assignment of a reference to another object (the so called update operation in the GC book I cited upthread). Thus automated GC automates the memory management in those prevalent cases. That's the advantage of automated GC and the case for which it is a "solution" (with performance and memory utilization tradeoffs). My proposal eliminates most of the said tradeoffs, giving what I postulate is the optimum balance.

In the cases (which are less prevalent) that do need to be explicitly assigned null, these would need to be explicitly freed in any other memory management scheme, so we can say that automated GC is not worse in these less prevalent cases, has the advantage of not crashing for "use after free", and has significant automation advantage in the more prevalent cases. The "going out-of-scope" and update cases are what Rust would achieve with the tsuris+inflexibility (documented upthread) of annotating and tracking lifetimes (and exclusive mutable borrows) and thus does not encompass these cases that require explicitly freed memory.

Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object. The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

JavaScript's closures can facilitate obfuscated/convoluted memory leaks. @keean had also noted this in these issues threads. This issue with closures can possibly be improved in another programming language design.

An example may make this clearer. Imagine I have a ring-buffer, which contains references to other data. I can 'clear' the ring-buffer efficiently by setting "start = end". With manual memory management, I leak all the data referred to in the buffer because I did not free the memory pointed to. With GC (RC or MS) I leak the memory because the references still exist in the ring buffer memory which is still in scope, even though I have cleared it by setting (start = end). The reference count in the references is not decremented by moving the ring-buffer start or end pointer, and the Sweeper can still find the references in the cleared ring buffer to follow them and mark the memory as still alive.

It doesn't matter whether there is GC or not, we have to know that when implementing the 'clear' method of the ring-buffer we must deal with making safe every reference in the buffer when moving the start or end pointer. So in both cases we need to understand the underlying memory and write code to safely handle references or we will leak memory.

Afaics, this is an irrelevant example for the standard definition of a circular buffer. The circular buffer has an empty operation which means there's nothing in the fixed size omnipresent buffer (which is supposed to store all instances that occupy the buffer), but the said buffer is not supposed to be freed until the last reference to the buffer instance is out-of-scope (or nulled) which will happen automatically and no explicit clear operation is required with automated GC.

However, what you're referring to though is a fixed-size buffer of references where the objects are not stored in-place in the buffer, where you want the buffer of references to remain allocated, but the object(s) referenced to be freed. Thus the references have to be nulled when they're no longer between the start and end indices.

I try and implement a fast-clear by setting the read-index equal to the write-index, and again I have just leaked all the objects that are referenced by the circular-buffer.

Yet this is a semantic error in the design of your circular buffer API (given that your buffer is a buffer of references, not a buffer of objects in place) and thus afaics has nothing to with the salient issues of choosing between automated GC or compile-time stack allocation.

you can still leak memory by forgetting to delete objects from the container

This has nothing to do with the salient issues of choosing automated GC or stack allocation. Yeah there are semantic memory leaks. But it's irrelevant to this issues we were discussing.

it is more the APIs provided by JS do not allow memory re-use, which leads to fragmentation and worse cache performance.

Indeed. These are higher-level semantic design issues which are (mostly if not entirely) orthogonal to the salient issues of choosing automated GC or stack allocation.

I think the problem is when you have any collection of objects you need to manage that collection yourself. For example you have an array of employee records, you have to delete the records when you have finished with them. Because the employee records are in an array, the garbage collector does not help you manage them. The GC only knows if the whole array is referenced or not. Real applications rarely have simple flat data-structures, consider a CRM database.

So the problem comes when I have long-lived data structures, where the contents of the structure(s) have a shorter lifetime than the structure itself.

Well yes you have to delete the employee records from the array, because your array is still referencing them. You're discussing semantic memory leaks, which is an unrelated issue. Every form of memory allocation strategy enables semantic memory leaks. JavaScript's WeakMap can automatically free objects which are only referenced by the WeakMap, because the keys are not enumerable thus the only way to access an object is to have a reference to its key object. There's no plausible analog for arrays, because the array elements are accessed by indices (not referenced keys).


Reference counting with immutable objects is an interesting point in the design space, as you cannot create circular references with immutable objects, so reference counting becomes a complete solution.

RC has the same reference problem as MS, but it also has the cycle-memory leak problem.

Not entirely correct...

The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects, so as to not leak these. You can re-read the research paper I cited upthread if you've forgotten. Essentially they only probe when reference counts are decremented. It's apparently not necessary to force the inflexibility of a certain programming paradigm (e.g. immutability everywhere) on the programmer. I think the probing algorithm is a source of some less ideal asymptotic complexity, which the research paper noted could be future work for improvement, but nevertheless even mark-sweep has poor asymptotic complexity (so comparing apples-to-apples of automated GC, then RC with probing is apparently no worse and has some better tradeoffs in other facets). There might be some additional innovations that can be made such as marking which objects are not candidates for circular references and thus don't need to be probed. Essentially my proposal requires the programmer to apply a minimal amount of manual insight into the memory allocation instead of the less performant fully automated GC-MS (generational + mark-sweep), yet avoiding the extreme tsuris/inflexibility of Rust's compile-time stack allocation.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354279398
Original Date: Dec 28, 2017, 6:14 AM CST


Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object.

Its exactly the same, if you don't free an object you leak the memory, if you do not null a reference (that is any root reference, or any reference reachable from a root) you leak the memory. Its exactly the same, you leak the memory if you fail to do it.

Yet this is a semantic error in the design of your circular buffer API (given that your buffer is a buffer of references, not a buffer of objects in place) and thus afaics has nothing to with the salient issues of choosing between automated GC or compile-time stack allocation.

This has everything to do with it. The whole point of GC is to prevent memory leaking due to semantic errors. If the programmer never makes semantic errors, then they would always program manual allocation correctly. In effect what this does is undermine the ease of programming/semantic argument for GC. GC in fact makes several classes of bug harder to find and more difficult to debug. As debugging is harder than writing code we want to optimise for ease of debugging, not ease of writing. This is strike one for GC.

Then we have the memory usage issue. If we are not going to require a new operating system then we need to abide by the requirements of current OSs:

However, use of the virtual address space is strongly discouraged in the Linux kernel. This is particularly true on 32-bit architectures where the virtual address space is limited to 100M by default. Using the virtual address space on 64-bit Linux kernels is also discouraged but the address space is so much larger than physical memory it is less of an issue.

So the OS wants us to minimise virtual address usage, even on 64bit systems. With GC we cannot have both high performance and minimal memory usage. We need about 50% additional memory to get performance, so this is strike two for GC.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354279685
Original Date: Dec 28, 2017, 6:16 AM CST


The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects

Well of course you can detect the cycles, but this costs performance. My point was that immutable objects (value semantics) do not produce cycles, so you can use simple RC without worrying about it.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354390714
Original Date: Dec 28, 2017, 8:54 PM CST


Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object.

Its exactly the same, if you don't free an object you leak the memory, if you do not null a reference (that is any root reference, or any reference reachable from a root) you leak the memory. Its exactly the same, you leak the memory if you fail to do it.

Incorrect from a programmer effort cost perspective. You're ignoring the other part of what I wrote quoted as follows:

@shelby3 wrote:

The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.


@keean wrote:

The whole point of GC is to prevent memory leaking due to semantic errors.

Incorrect/disagree. The point of automated GC is to eliminate meticulous compile-time tracking of lifetimes. And my proposal attempts to lift the performance up to nearly par by eliminating the write-barrier.

If the programmer never makes semantic errors, then they would always program manual allocation correctly.

Incorrect conceptualization. Semantic errors will be semantic errors in any memory allocation management scheme. I'm repeating myself.

So the OS wants us to minimise virtual address usage, even on 64bit systems. With GC we cannot have both high performance and minimal memory usage. We need about 50% additional memory to get performance, so this is strike two for GC.

I posit that my proposal will be better in these facets than BG-MS and BG-RC. The fragmentation will be less because all the non-RC references (which should be most objects, because of the strong law that most objects die young) will die in the nursery and never create fragmentation that has to be offset by the large virtual memory space. Also bump pointer allocation can be used within the heap allocator to mitigate fragmentation significantly for the RC referenced objects (this was mentioned/cited in more detail upthread). The performance of my proposal should be closer to Rust's compile-time allocation lifetimes (and thus also with more algorithmic flexibility), because of the elimination of the write-barrier on update of references. The use of RC for long-lived objects will have better asymptotic complexity w.r.t. to percentage of the physical heap allocated and total heap size allocated as compared to MS (mark-sweep). The remaining asymptotic weakness is the probing for circular reference leaks, but I also mentioned a possible algorithmic improvement (and it's no worse asymptotically and better is some facets than mark-sweep as is, although it might possibly leak some circular references in corner cases, I'm not sure).

But you'll have fragmentation issues with Rust also. It's disingenuous to imply (by your allegation against alternatives) that programming in Rust is some panacea without fragmentation and other issues.

Indeed you're correct that compile-time (especially all on the stack) allocation can be more efficient (but I posit this difference will be significantly less with my proposal), this is simply not a good tradeoff in terms of programmer productivity for a general purpose mainstream programming language. The algorithmic inflexibility can be quite onerous also. For most use cases, it is over-engineering and too slow to code and complex to maintain. Rust will probably still have market for those use cases where it matters such as embedded and mission critical applications (such as perhaps the infrastructure for other apps such as OSes and web browsers). Profiling can be used to determine which small percentage of the code needs to be highly, hand-tuned, meticulously optimized. I had already addressed this quoted as follows:

@shelby3 wrote:

Granted it requires more memory than a tightly/meticulously coded stack allocation paradigm. So Rust or something like it, will still be useful for mission critical embedded applications and such. But IMO meticulously coded (inflexible, significant tsuris) stack allocation not an optimum tradeoff for a general purpose programming language.

Linux on 64-bit will accommodate virtual address space usage. If the mainstream programming language demands it, then Linux will be further improved along those lines. OSes are not static things which do not adjust to the overriding trends. And I've also pointed out that fragmentation will be less with my proposal. There's no panacea. We aim for the sweet spot for a general purpose mainstream programming language.

@keean wrote:

The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects

Well of course you can detect the cycles, but this costs performance. My point was that immutable objects (value semantics) do not produce cycles, so you can use simple RC without worrying about it.

You're repeating what I already admitted/wrote. But I also wrote quoted as follows:

@shelby3 wrote:

It's apparently not necessary to force the inflexibility of a certain programming paradigm (e.g. immutability everywhere) on the programmer. I think the probing algorithm is a source of some less ideal asymptotic complexity, which the research paper noted could be future work for improvement, but nevertheless even mark-sweep has poor asymptotic complexity (so comparing apples-to-apples of automated GC, then RC with probing is apparently no worse and has some better tradeoffs in other facets). There might be some additional innovations that can be made such as marking which objects are not candidates for circular references and thus don't need to be probed.

The point is that forcing immutability programming paradigm everywhere is very inflexible. Programmers want to have freedom to use different programming paradigms. Also immutable data structures have a log n performance cost. They're not a panacea.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354398794
Original Date: Dec 28, 2017, 11:09 PM CST


The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

This is wrong. If you don't null it, you will leak the memory at runtime.

You seem to think the consequences of not freeing an object are somehow worse? They are not. You don't have to free any object, when a program terminates all its memory is returned to the system, whether the object was freed or not. In 'C' you for not really need to free anything, the only cost is the program will use more memory, just like if you forget to null a reference with GC. If you have enough RAM you don't have to free any memory at all.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354400076
Original Date: Dec 28, 2017, 11:30 PM CST


Incorrect from a programmer effort cost perspective. You're ignoring the other part of what I wrote quoted as follows:

@shelby3 wrote:

The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

This is wrong. If you don't null it, you will leak the memory at runtime.

No you're wrong and you still do not grasp the point I'm making.

With automated GC, nulling a reference does not necessarily free the object, because there might still be other references to the same object existing. With automated GC, the programmer doesn't have to know whether those other references exist or not. Whereas, where you compare this to the freeing the object by calling the free() in C or equivalent in other languages, I'm pointing out that to free the object, the programmer must know if there are any other such existing references.

The point was that both manual (i.e. compile-time) and automatic (i.e. runtime) memory allocation management require the programmer to take actions for semantic cases that need to be dereferenced (e.g. the circular buffer of references example you provided). But the automated GC has the advantage that the programmer doesn't have to know the lifetimes of all the references to the same object at compile-time. Thus, per some examples I showed upthread for Rust, the algorithmic flexibility can be greater and less meticulous with automated GC as compared to Rust's compile-time lifetimes.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354403743
Original Date: Dec 29, 2017, 12:19 AM CST


I'm pointing out that to free the object, the programmer must know if there are any other such existing references.

And I am pointing out you do not care about other references if you never call free. Just leak the memory. If you don't null the reference you leak the memory, so it is no better than never calling free, so just don't call it.

In C++ you just use a "shared_pointer" that implements RC for cases where the semantics of freeing are complex.

Most variables in a program have simple local scope. Very few variables actually have complex multiple references with different lifetimes. In a way GC is optimising for the uncommon case.

Whatever the solution it has to make debugging memory leaks as easy as possible, as well as dealing with the common simple local variable efficiently.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354521258
Original Date: Dec 29, 2017, 7:55 PM CST


@keean wrote:

And I am pointing out you do not care about other references if you never call free. Just leak the memory.

Who wants to design for the cases of promoting memory leaks (which are not semantic in nature or let's say can be dealt with automatically by RC or MS)?

If you don't null the reference you leak the memory, so it is no better than never calling free, so just don't call it.

No better in the case that the programmer desires memory leaks. That makes absolutely no sense.

I explained in the case of not desiring memory leaks or "use after free" then automated GC automates the tracking of the lifetime of an object without complex and inflexible compile-time tsuris.

In C++ you just use a "shared_pointer" that implements RC for cases where the semantics of freeing are complex.

Yup. And I'm proposing the same thing, but merging it with a nursery that thus doesn't need a write-barrier and thus is I posit nearly as performant as the tsuris and algorithmic inflexiblity of Rust's lifetimes with exclusive mutable borrows, with much greater programmer productivity. I can believe I am having to write this for the 10th time and you still haven't recognized the potential advantage. It's absolutely exasperating.

Most variables in a program have simple local scope. Very few variables actually have complex multiple references with different lifetimes. In a way GC is optimising for the uncommon case.

Whether that is true or not and you don't know what the percentage is, the fact is that the other paradigms are algorithmically inflexible. I cited some examples upthread for Rust where the lifetime checker can't detect what is safe code. And the annotations and hassles to prove to the compiler which references are simple RAII and which are more complex but still within the lifetime paradigm, is a significant drain on programmer productivity, maintenance costs (code readability/simplicity), algorithmic flexibility, and complexity of the type system and compiler.

Whatever the solution it has to make debugging memory leaks as easy as possible,

What golden rule is that? And where is the proof? All programming language design decisions are thrown in a mix of tradeoffs.

as well as dealing with the common simple local variable efficiently.

Which my proposal apparently will do, as well as more efficient in many other ways such as programmer productivity.

You're a person who wants to overengineer things. You'll never create a friendly, light, fun, mainstream language adopted by millions of programmers. I'll put that out there as a challenge to both of us, and let's see which us succeeds in doing so.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354532259
Original Date: Dec 30, 2017, 1:30 AM CST


Who wants to design for the cases of promoting memory leaks (which are not semantic in nature or let's say can be dealt with automatically by RC or MS)?

My feeling is this is what GC encourages, especially when combined with closures. Most programs written in scripting languages leak memory, but it's okay because the programs are not generally long lived. If you want stable memory use and GC you might need to get rid of closures.

No better in the case that the programmer desires memory leaks. That makes absolutely no sense.

No better in the worst case makes a lot of sense from a design perspective. Many algorithms are chosen based on the limit of their worst case performance.

"use after free"

Not freeing memory entirely eliminates use after free errors, and is faster than freeing. For short lived programs it is an optimal strategy. This is a the simplest case of a region allocator (a single region). Just let the program use the memory it needs and let the OS clean up at the end. Many mathematical problems fit into this class.

Yup. And I'm proposing the same thing, but merging it with a nursery that thus doesn't need a write-barrier

If you remove longer lived objects from the nursery it is no longer so efficient, as you now have to explicitly defragment it, rather than compacting when copying objects into the next generation. I think you are missing some details by over simplifying your mental model of what is going on. Actually write the code for this, and you will find its more complex than you think.

What golden rule is that? And where is the proof? All > programming language design decisions are thrown in a mix of tradeoffs.

Debugging is clearly harder than writing a program in the first place. I can write an incorrect program in no time at all. Writing a correct program is hard. The proof is to start measuring the time you spend writing Vs debugging and maintaining code. It's something in think a team leader , project manager, or anyone that has had to estimate the development time for writing code should realise.

I cited some examples upthread for Rust where the lifetime checker can't detect what is safe code.

Proving a variable is local is very different from knowing it is local. The majority of variables can be local even if you can't prove they are. This one should be simple, if we do not allow references, then we have to use global variables to maintain state between functions. Simply find a language that does not provide references and count the number of local Vs global variables across several different programs.

Actually this is a useful thought - maybe a simple memory management solution that removes the need for GC is simply don't allow references. Force the direct use of variables in an outer scope. This is basically the solution all databases use (as in a database a 'reference' is actually an index).

I can believe I am having to write this for the 10th time and you still haven't recognized the potential advantage. It's absolutely exasperating

It clearly has some advantages, it's more of an incremental improvement than a ground breaking paradigm shift though.

programmer productivity.

In my experience, programmer productivity is dominated by debugging time, except in rare instances where complex algorithmic research is required. Most programmers do not spend any time doing this, and it is largely restricted to university research papers.

You're a person who wants to overengineer things.

Maybe, but I also am looking for something that is different enough to drive adoption. Being a better JavaScript is unlikely to get massive usage (look at all the languages out there with low adoption rates).

For amateur programmers you really want something like a spreadsheet. Something like Apple's "HyperCard", and for that there is no question that GC is the correct approach.

I am a professional programmer, so naturally I am interested in a language for professionals, because I want to solve real problems that I have with existing languages.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354559246
Original Date: Dec 30, 2017, 11:52 AM CST


@miguel raised an issue with me in private discussions about heterogeneous collections w.r.t. to my proposal. He was thinking that the reference counting type was on the class (whereas, it's on the reference). But there's a related issue which is that the programmer may wish to add to a collection, objects which are both reference counted and not, because it may be more efficient than promoting all non-RC (i.e. nursery) objects out of the nursery to RC just to be able to put them in some collection mixed with RC objects. Otherwise it's an analogous issue to "what color is your function". To accommodate that case, we could have a third type of reference which doesn't know at compile-time whether it points to a RC or non-RC object. Thus, it reinstates the write-barrier. Sometimes the write-barrier would be more efficient than promoting all nursery objects to RC for a case where the two types have to be intermixed.

Note that in my proposal, RC references can never be assigned to non-RC (i.e. nursery) reference types. GC with write-barriers don't have this limitation, because the references all have the same type at compile-time. But this performance cost is paid every where. Need to ponder whether this "what color is your function" applied to references, is a significant source of inflexibility?

Tangentially, there may be scenarios where the RC cost can be skipped down the call stack when the call stack lineage holds a reference to the object. My idea for not having out-of-order callbacks in single-threaded code makes it easier to reason about which of the said RC increments/decrements can be avoided. The point being that we can perhaps avoid most of the cost of RC in some cases.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354561173
Original Date: Dec 30, 2017, 12:29 PM CST


@keean wrote:

My feeling is this is what GC encourages, especially when combined with closures.

I already mentioned the closures issue upthread and that it would need to be investigated thoroughly to see what could be done.

I don't think GC encourages the type of semantic errors such as your circular buffer example. Thus, I think you're exaggerating. I'm open-minded, if you have hard statistical data indicating otherwise.

No better in the worst case makes a lot of sense from a design perspective. Many algorithms are chosen based on the limit of their worst case performance.

Your application of that truth to this case, doesn't make any sense. In the worst case, the two memory allocation strategies can both leak, i.e. they're equivalent in the worst case. In the best case, automated GC automatically tracks which objects are freed and the programmer need only null the reference to indicate it is no longer in use. Whereas, some manual memory allocation scheme has to prove that the lifetime of the object is out-of-scope and can be released. This is the Nth time I have stated this.

I have already stated numerous times the positive and negatives (the tradeoffs) for automated GC and how my proposal posits to improve on them.

Not freeing memory entirely eliminates use after free errors, and is faster than freeing. For short lived programs it is an optimal strategy. This is a the simplest case of a region allocator (a single region). Just let the program use the memory it needs and let the OS clean up at the end. Many mathematical problems fit into this class.

You're still trying to justify your logic above. It remains the case that in the worst case they're equivalent. And thus your entire logic from the start on this point fails as explained above.

If you remove longer lived objects from the nursery it is no longer so efficient, as you now have to explicitly defragment it, rather than compacting when copying objects into the next generation.

Absolutely incorrect. I'm not going to re-explain it for the Nth time.

I think you are missing some details by over simplifying your mental model of what is going on. Actually write the code for this, and you will find its more complex than you think.

I read the (first several chapters of the) book on GC recently. You're the one who is apparently lacking understanding presumably because you haven't read it. It's disingenuous to accuse someone of missing details, when you're not stating what those details are specifically.

Debugging is clearly harder than writing a program in the first place.

Your implied claim is that Rust's compile-time provably correct lifetimes and exclusive mutability will never need to be debugged. But this is an extremely myopic point to make, because Rust has to punt to RC and unsafe code in certain cases, and even in provably safe code, the semantic memory leak errors can still occur.

You have failed to prove that my proposal is harder to debug or even that the amount of difficulty isn't offset by the increased productivity, higher readability of the code, lower maintenance, etc.. You're just throwing out subjective opinions without any statistical studies or substantiation.

I'm one of the most prolific debuggers on every major project I've worked on. Please don't try to insinuate I know nothing about debugging. Also I would not agree that debugging is always more difficult than the writing the program. A well written program involves a lot of forethought. Carefully written, well organized code, with forethought as to sources of potential bugs can alleviate many types of "action at a distance" type of bugs that (appear to be random and not reproducible and thus) are extremely difficult to track down.

You're relating your experience as a code reviewer for a large team and trying to keep a large team all on the same page (which is a well known Mythical Man Month dilemma). So you would like to force them all into a compile-time enforced structure which insures more correctness.

We already had the abstract (relativistic physics) debate in the past wherein I explained/argued to you that it's impossible to statically prove that all sources of bugs don't exist. The more static checking that is piled on, the less flexible the programming language becomes. You're infatuation with gobs upon gobs of complex type system cruft, is not going to end up being the panacea you dream of. The problem is as Rust's lifetimes model exemplifies (see the examples upthread of safe code which Rust can't prove is safe), that it's not possible to prove every possible kind of program is safe. The variants of typing required are an exponential explosion.

Actually this is a useful thought - maybe a simple memory management solution that removes the need for GC is simply don't allow references.

When you have a real proposal, I will pay attention. Otherwise I'll just ignore this as I did the suggestion that immutable everywhere is performance solution for the write-barrier when it's known to introduce a log n algorithmic performance degradation.

It clearly has some advantages, it's more of an incremental improvement than a ground breaking paradigm shift though.

I don't yet know if it is a good idea. There's some disadvantages, per the prior post. But I don't acquiesce to your egotistical slant. The design potentially interacts with my other ideas about single-threaded model for concurrency and avoiding Rust's need to prove exclusive mutability. Taken holistically, it might be a significant ground breaking language shift. We'll see...

Note the comment of mine to which you were replying wasn't asking for you to agree the proposal is great, but an exasperation that you hadn't yet grasped the posited advantages and tradeoffs completely as evident by some of your comments. I wasn't looking for a slap on the back (although I apparently received a slap down instead), rather just pushing for mutual coherence in our discussions. The level of noise is discouraging.

In my experience, programmer productivity is dominated by debugging time, except in rare instances where complex algorithmic research is required.

Thus your programmers never read and maintain any other programmer's code.

Also a facet of debugging is being able to understand the code you're debugging.

Meticulous cruft static type system complexity (as well too much complex type inference) can be argued to be difficult to conceptualize and read.

Being a better JavaScript is unlikely to get massive usage (look at all the languages out there with low adoption rates).

TypeScript has had the fastest growing adoption of a new language in recent memory.

Python is very popular because it's very easy to express algorithms with. Although I don't think anyone should use Python for a major project with a million lines of code and a large team. Seems to me your argument for greater compiler checking is because you're prioritizing large projects with large teams?

But I'll state with high confidence that most programs will end up being smaller. That the programming world is moving away from large Mythical Man Month morass clusterfucks (even those supported by socialism for example and allowing much slower rates of progress than the free market) and towards small teams, apps, and interoperability standards.

You're insinuating that PureScript, CoffeeScript, etc are better than JavaScript. I disagree. They all suck. Why the heck would I want to use any of those?? I don't want Haskell's abstruseness. I don't want some half-baked syntax sugaring.

For amateur programmers you really want something like a spreadsheet. Something like Apple's "HyperCard", and for that there is no question that GC is the correct approach.

Who is targeting idiots? Not me.

Vitalik Buterin and Eric S. Raymond both prefer/rave about Python. Both of them are probably 150+ IQ.

I am a professional programmer, so naturally I am interested in a language for professionals, because I want to solve real problems that I have with existing languages.

Which real problems? Self-documenting code via complete type annotations? Seems to me you want a type system to prevent team members from violating any and all invariants? But I already explained that's an impossible pipe dream. So which real problems?

The entire point of my proposal is trying to holistically figure out how to get low-level coding and high-level coding into the same language and to deal with the concurrency issues without needing to prove and restrict to exclusive mutable borrows every where. I'm basically looking at the weaknesses of JavaScript (and TypeScript) where it doesn't (they don't) meet my needs. And I've looked at C++, Rust, Java, and Scala and they also have other issues.

Note I do want to spend some time asap thinking about which language I am going to use interim until Lucid might become a reality.

I know typeclasses has been a top priority of yours. I've never seen your holistic rationale on which memory allocation strategy you want. Seems you were checking out Rust, Ada, and contemplating other designs such as regions and I hadn't yet seen your conclusive remarks.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354568064
Original Date: Dec 30, 2017, 2:59 PM CST


You're still trying to justify your logic above. It remains the case that in the worst case they're equivalent. And thus your entire logic from the start on this point fails as explained above.

No in the worst case GC fails because it makes memory leaks much harder to debug. It is very difficult to track down at runtime exactly where the leaks are coming from. Whereas tools like "valgrind" make it relatively straightforward for manual/deterministic memory management.

Absolutely incorrect. I'm not going to re-explain it for the Nth time.

You have not explained this once. Point me to the algorithm or to a paper which explains this if you cannot explain it well enough yourself.

Your implied claim is that Rust's compile-time provably correct lifetimes and exclusive mutability will never need to be debugged.

You misunderstand me, I am not saying anything about Rust. All I am saying is that it is easier to debug deterministic and predictable systems, where the programmer has visibility and control of what is going on.

TypeScript has had the fastest growing adoption of a new language in recent memory.

I am using Typescript

Python is very popular because it's very easy to express algorithms with.

I am using Python too. You can get pretty good performance from python using numpy to avoid explicit iteration, operating on vectors and matrixes of numbers. Still for the heavy lifting you have to call out to 'C' using python modules.

I like Python, I don't need another Python.

So which real problems?

Writing a compiler that is modular and understandable. Writing clean parsers. Monte-Carlo simulations, and AI.

I know typeclasses has been a top priority of yours. I've never seen your holistic rationale on which memory allocation strategy you want. Seems you were checking out Rust, Ada, and contemplating other designs such as regions and I hadn't yet seen your conclusive remarks.

I don't really have any conclusive remarks. What I can say is that GC is a good baseline, but its clear that for real performance manual management like 'C' is needed. This would be the Python approach, high level in python, low level in C.

As for typeclasses, I could go for some kind of modules instead. The important things it that it should be based on Category Theory, and avoid lots of boilerplate.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354588337
Original Date: Dec 31, 2017, 12:41 AM CST


@keean wrote:

You have not explained this once.

It's all upthread, numerous times. Does require though assimilating cited references and my comments. I'm not going to repeat myself again at this time for the Nth time.

No in the worst case GC fails because it makes memory leaks much harder to debug.

Now you're moving the goal posts. You had just argued that worst case comparison was about allowing them both to leak because it doesn't matter for short-lived programs. Your logic is discombobulated. Now you're using the phrase "in the worst case" in a point about avoiding worst case by debugging memory leaks. Thus the sentence is illogical.

What you really are doing is trying to ignore that mistake you made in logic and change the discussion to about the relative debugging costs of different memory allocation strategies. This new direction is an interesting area to discuss, but it doesn't absolve you from your earlier mistake, which you apparently refuse to acknowledge. Any way, I'm not trying to rub your face in it, but please also stop implying I was wrong, when in fact your mistake remains. Clueless readers can be easily mislead by your attempts at obfuscating the discussion.

It is very difficult to track down at runtime exactly where the leaks are coming from. Whereas tools like "valgrind" make it relatively straightforward for manual/deterministic memory management.

Afaics, the memory leak detection is valgrind is reporting which allocations were never freed when the program shutdown. A similar tool can be made for a GC. Thus I think your assertion is bogus.

All I am saying is that it is easier to debug deterministic and predictable systems, where the programmer has visibility and control of what is going on.

As you see above, the devil is in the details of such claims. I challenge you to prove that C++ or Rust is easier to debug than Java. A Java expert will likely be able to point to tools they use to aid debugging.

I will repeat again, the determinism in Rust's compile-time lifetimes and exclusive mutable borrows have nothing to do with (or at least not all types of) semantic memory errors. I fail to see any determinism or predictability in the occurrence of such errors in any of those systems. Rust models a class of errors at compile-time but it doesn't model every form of (or even memory allocation) error, e.g. where Rust must punt to RC references. And the cost of that limited amount of compile-time error checking is a reduction in algorithmic flexibility, increase in verbosity, complication/cruft of the type system, etc.. Probably there are use cases in which it is a desired tradeoff for that mission critical use case to restrict to only safe (i.e. absolutely no instances of unsafe) Rust code (with no RC references), so that the memory allocation is provably leak-free at compile-time (which also requires an onerous exclusive mutable borrowing paradigm as well, i.e. the lifetimes are inseparable from the mutability restrictions in Rust). But I don't think those use cases are the next mainstream programming language.

Writing a compiler that is modular and understandable. Writing clean parsers. Monte-Carlo simulations, and AI.

I fail to see how that has anything to do with breaking new ground for a mainstream programming language, other than as a research tool. I'm addressing real issues I have where I can't write all the code (client, server, high and low-level) in one programming language. And where I can avoid the tsuris of Rust or C++. And where I can avoid Java's verbosity, GC pauses, and lack of for example a native unsigned type. And where I don't have to write noisy code with braces and semicolons. And where I can later perhaps also get typeclasses. And where I can get some of TypeScript's typing features such as union types and flow-based typing (e.g. TypeScript's typeof and instanceof guards), which leads to more elegant code.

Writing a clean compiler is an admirable goal, but it is not germane to the issue of which real mainstream programming problems are we addressing. And no I am not trying to write a compiler which programmers can program. In past comments, seems you're focused on trying to build a stack of compiler layers starting from Prolog unification and up. That is a massive goal which will require years and involve many false starts and restarts.

Your apparent aspirations towards a compiler framework starting from correct first principles, is an interesting research goal to explore. Perhaps something for me to dabble in when I am retired. In the meantime, I need to focus on near-term deadlines and needs. I'm also remembering @skaller who is experimenting with his "kitchen sink" exploration of the programming language design space with his Felix programming language research. He's apparently retired after having worked on the C++ committee in the past.

I'm trying to solve some (few or several) major design goals in the most efficient way possible. Not to fool myself into believing I can solve the "kitchen sink" of the vast programming language design space in any reasonable time frame.

but its clear that for real performance manual management like 'C' is needed. This would be the Python approach, high level in python, low level in C.

Precisely the segregation and tradeoff my proposal wants to eliminate. Yet you claim it's not potentially ground breaking. For example, eliminating marshalling and algorithmic inflexibility costs of the FFI.

As for typeclasses, I could go for some kind of modules instead.

I remember that discussion we had in these issues threads. And I'm also not sure which I want. I decided to focus first on my more immediate needs and then figure that out later when I have more recent coding to analyse and guide my perspective. Most of my coding was done more than a decade ago, and I for example didn't know about the concept of typeclasses then, so I need to have these concepts in mind as I code forward in order to formulate a reasoned perspective.

We had some very interesting discussions in the past.

It's possible that my proposal is lacking in some major way. Yet I'm not going to acquiesce to what I perceive to be incorrect statements. The main problem I see thus far with my proposal is that RC and non-RC references can't interopt freely, i.e. there's a reduction in degrees-of-freedom in exchange for the benefits the proposal brings.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354592080
Original Date: Dec 31, 2017, 2:36 AM CST


I am not sure what mistake you think I have made?

For server stuff, Go where you need performance, otherwise Python is good enough. At the moment I am using Typescript client side and Python on the server, and it is fine. I don't really need a replacement for those, I don't need manual memory management because I am not dealing with performance issues. Where I come across problems is in C/C++/Ada/Rust when I am concerned about performance, and I find I cannot use abstraction and have a good architecture without losing performance.

The only time I have seriously hit the limits of JavaScript/TypeScript abstraction is when trying to write a Prolog interpreter or any language compiler in them.

From a support point of view, Java has been the worst for support, as the code quality was poor, and it's use of memory (per user) was terrible. The service would chew through memory crashing every couple of days,. Whereas in comparison our Python services run for months without intervention. The conclusion is that Java threads use a large amount of per-thread memory to support N concurrent users, and the memory leaks over time.

The two solutions that seem to work well are Pyhton (which reference counts, and uses MS for freeing cycles only) and JavaScript/Go like concurrency. The former works even though it has threads because it is largely deterministic about memory due to RC, the latter works because it uses asynchronous concurrency with MS GC.

So this little analysis suggests that for web-server you do not want a threaded model and mark-sweep (generational) GC together for this kind of application.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354604837
Original Date: Dec 31, 2017, 7:49 AM CST


There's been at least two major themes I'm trying to improve upon for programming language design other than the issues around genericity/reuse/typeclasses/modules/HKT/HRT (i.e. higher-order type systems on the Lambda cube):

  • concurrency/parallelism
  • avoidance of a FFI for integrating low-level coding with the higher-level conveniences.

I had explained (in another thread, but I think it's linked from the OP) that asynchronous callbacks are essentially the same as multithreaded re-entrancy in terms of exposing the possibility for race conditions on mutable data (although callbacks at least delineate the race conditions). Rust prevents such race conditions by requiring (at compile-time) exclusive borrowing of mutable references every where, which is quite onerous. Go (and JavaScript without callbacks) solves this problem by not allowing shared memory between channels. The Go CSP channels (or separate JavaScript threads) can (or should) only share data by passing messages. This prevents the optimal multi-threaded access of a large shared immutable data structure such as the UTXO of a blockchain (which can be GBs). We had also discussed that lockless design is far superior because the bugs with synchronization are usually unbounded, thus we're frowning on idiomatic Java as a concurrency solution.

My proposal of explicitly declaring which references are RC coupled with the proposed design decision to give each thread it's own non-shared nursery and allow RC references to be shared between threads (and coupled with my proposal to not allow out-of-order asynchronous callbacks1), clearly demarcates the RC references that can lead to mutable data race conditions. Thus the programmer only needs to insure the RC references are to immutable data structures where necessary to insure no data races and/or to queue modifications (including the increments and decrements to the reference counts) that are order independent so they can be batch processed by a single-thread to avoid data races involved with multi-threaded update of data.

Giving each nursery it's own non-shared thread is also beneficial in that for low-level code (each instance running in said single-threads) can do low-level pointer manipulations without references being moved, because the nursery copy compacter won't run until that thread is stalled (and presumbly not stall a function which is marked as low-level). So this eliminates a marshalling and FFI issue required for typical GC languages to interact with low-level C-like coding. Additionally the RC references never move (in virtual address space).

I hope with this explanation, readers see there's some holistic design considerations motivating my proposal.

Obviously the fault in my proposal is that mixing nursery and RC referenced objects in a collection could be woesome. Even we now note the added complication that we would be combining references that are single-threaded access with those which have multi-threaded access.

@shelby3 wrote:

Tangentially, there may be scenarios where the RC cost can be skipped down the call stack when the call stack lineage holds a reference to the object. My idea for not having out-of-order callbacks in single-threaded code makes it easier to reason about which of the said RC increments/decrements can be avoided. The point being that we can perhaps avoid most of the cost of RC in some cases.

1 This requires further analysis and discussion. And especially how we will handle UI code. Functional reactive programming? I'm referring to not having any shared mutable state modified while waiting for any callback. With that rule, then exclusivity of mutability of the the single-threaded access to nursery objects is guaranteed.


I am not sure what mistake you think I have made?

Can we stop beating a dead horse? IMO, it's clearly articulated in the thread for anyone with sufficient reading comprehension skills. I'm more interested in any insights you have regarding memory allocation strategies than rehashing what I think your error in conceptualization was.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354637004
Original Date: Dec 31, 2017, 11:11 PM CST


This discussion is timely because I'm needing to make a decision pronto on which programming language to start coding our blockchain. The choice of programming language may also impact hiring decisions as well, which I need to make pronto.

@keean wrote:

From a support point of view, Java has been the worst for support, as the code quality was poor, and it's use of memory (per user) was terrible. The service would chew through memory crashing every couple of days,. Whereas in comparison our Python services run for months without intervention. The conclusion is that Java threads use a large amount of per-thread memory to support N concurrent users, and the memory leaks over time.

The memory leaks were due to errors of your programmers, not due to Java's GC? Or that the MS of the GC could not keep up with the rate of allocation of objects which outlived the generational nursery, thus causing huge "stop the world" pauses to recover the memory? I gather that you're blaming Java's verbosity (e.g. no functions, everything must be a method) and use of synchronization instead of lockless design (which tends to be buggy and thus probably leaky) for promoting errors in coding?

The two solutions that seem to work well are Python (which reference counts, and uses MS for freeing cycles only) and JavaScript/Go like concurrency. The former works even though it has threads because it is largely deterministic about memory due to RC, the latter works because it uses asynchronous concurrency with MS GC.

MS has very bad asymptotic complexity. RC has better asymptotic complexity as long as there's some low asymptotic complexity mechanism to free cyclical reference leaks (presuming these leaks are less common then the MS or probing mechanism can run less frequently than a pure MS without RC).

My proposal recognizes that RC has very low cost when employed only for long-lived objects, and nursery without a writer-barrier has nearly 0 cost for the short-lived objects. So in theory my proposal can combine the performance of Rust with the convenience of automatic GC and with excellent asymptotic complexity.

I'm presuming you have not attempted to use JavaScript/Go extensively on the server?

For server stuff, Go where you need performance, otherwise Python is good enough.

Python is too slow and lacks native low-level coding capability, thus probably isn't acceptable for systems programming such as a blockchain full node. Also I heard that Python has holes in it's type system such as monkey patching which will make it impossible to optimize and also leaky in terms of invariants. And Go lacks higher-level abstractions (i.e. no generics) and it has a suboptimal by convention only concurrency safety by frowning on shared memory and doesn't have my proposed performance improvements for automated GC. I heard a redesign of Go is being contemplated? I like some of the ideas from Go such as how they do low overhead green threads at a function boundary. I wonder how Go integrated their low-level pointers with the GC? If the GC moves objects then how does Go patch up all the pointers and which threads get stalled and when (i.e. is there only one GC instance for all threads)? Edit: apparently Go doesn't currently move objects, so it's not a well thought out design point. As expected Go is a suboptimal design in this respect.

So in summary there doesn't exist a programming language which combines good abstraction type system, native low-level and high-level coding, lockless model for avoiding data races, and a performant automated GC (noting that RC is a form of automated GC). Rust has the first three said items, but complexity is very high due to compile-time lifetimes and exclusive mutability instead of the latter said item. Go basically has none of the stated items, although at least it has automated GC, green threads, and CSP channels as an incomplete paradigm for the data race issue. Python has leaky but flexible abstraction and lacks the latter three said items. Java lacks all four said items, although the upcoming Scala 3 (compiled to JVM bytecode) addresses the first said item reasonably well and perhaps lockless design could be achieved by convention. C++ can do all the stated items except for lockless design (unless by some convention or library), but some facets only via libraries and the complexity is thus very high and verbose, and the optimal integration is lacking due to the limitations of doing these features in libraries. TypeScript's main flaw derives from JavaScript/ECMAScript in that there's no way to share native code level access to objects between threads. And the integration of low-level (via ASM.js) and high-level is inflexible and not optimal.

Why is it not clear that I'm trying to address a void in the programming language design space?


EDIT: another issue is the lack of anonymous union (as opposed to nominal/named unions), of which apparently Go has neither! Go's type system appears to be highly lacking. The only programming languages I know of which have anonymous unions are TypeScript, Scala 3 (aka Dotty/DOT calculus), Ceylon, and PureScript. Also TypeScript has elegant typeof and instanceof type guards which alleviate the need to explicitly cast the type. Scala provides the elegance and reduced boilerplate of "everything is an expression" that has a value (although with a transpiler we could simulate this in for example TypeScript with anonymous functions).

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354644388
Original Date: Jan 1, 2018, 3:18 AM CST


If you use numpy then python can be pretty fast. Fast enough that recoding in 'C' is not worth the effort for many tasks. The more data you have, the bigger the benefit of working with numpy.

We do use JavaScript on the server side, but it has no equivalent to numpy, which means it cannot match python for numeric computation, even though it is a faster language in general. JavaScript is most useful for managing high latency operations for many users due to its asynchronous nature.

I have not really used Go yet for any major projects, but it's something I want to try.

Why is it not clear that I'm trying to address a void in the programming language design space?

Everything is possible using existing languages. So the question is, what are you trying to improve? My guess would be programmer productivity for a class of problems. These are not problems where performance is critical, because you are not trying to compete with C/C++/Rust for pure computational speed. It appears your target is more web-services, where currently TypeScript and Go probably offer the best scalability.

Erlang probably deserves a mention in the productivity/scalability category as well. The 'actor' like model enabled Ericsson to write a new operating system for their routers that is very robust, when a project in a more traditional language ('C' I think) was stuck in debugging. The sorry I heard was they were able to write a solution in Erlang that was correct in less time than it took to debug the already written solution in 'C'. So much so they they abandoned 'C' and moved exclusively to Erlang. Core routers require performance and parallelism to get high packet throughputs, and need to be robust and run for long periods of time without operator intervention.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354647910
Original Date: Jan 1, 2018, 5:07 AM CST


@keean wrote:

If you use numpy then python can be pretty fast. Fast enough that recoding in 'C' is not worth the effort for many tasks. The more data you have, the bigger the benefit of working with numpy.

A numeric library is not very relevant to this discussion. Not only is it of limited interest to applications who need such a library, the existance of a certain library doesn't speak to the features and design of the programming language. JavaScript may also have an efficient numeric library available, given the C FFI for JavaScript is ASM.js (meaning any C/C++ library can be compiled for JavaScript).

The sorry[story] I heard was they were able to write a solution in Erlang that was correct in less time than it took to debug the already written solution in 'C'.

That's a useless data point, because comparing to C means just about any other language would give better results in terms of paradigms and compiler assistance. The Actor model and Go's CSP are not panaceas. The devil is in the details and also the other features+paradigms these languages provide.

Everything is possible using existing languages.

Everything is possible in assembly language too. That doesn't mean anyone would want to code an entire program in assembly language. Thus your statement is meaningless and sidesteps the points I'm making about programming language features and paradigms.

Seems you totally didn't address the points I made in the prior post!

So the question is, what are you trying to improve? My guess would be programmer productivity for a class of problems. These are not problems where performance is critical, because you are not trying to compete with C/C++/Rust for pure computational speed. It appears your target is more web-services, where currently TypeScript and Go probably offer the best scalability.

Have you not read where I mentioned (several times in fact) that for a blockchain full node I need maximum performance? I have mentioned UTXO numerous times. Have you even googled the term "UTXO"? Do you not know that a blockchain node has to access GBs of unspent transaction outputs and parse cryptographic signatures, and other performance critical tasks such as processing Merkle trees, etc.

Did you already ignore the multiple times I have mentioned that TypeScript (JavaScript) can't access a shared UTXO data structure from multiple threads?

@shelby wrote:

This prevents the optimal multi-threaded access of a large shared immutable data structure such as the UTXO of a blockchain (which can be GBs).

How can you recommend TypeScript to me after I have told you this numerous times including mentioning it to you in a private message recently? The node would have to queue up the multithreaded workload and funnel it through a single-threaded UTXO access. There's ways to put shared data behind a FFI API as Memcached does, but then it's not native code and some verbose API (although I'm contemplating that a transpiler could hide all this verbosity and make everything appear to be native and so the transpilation to TypeScript might still remain my best choice but I was hoping to be able to choose an existing programming language I could start with immediately without needing to develop a transpiler first).

How can you recommend Go given it has no generics and not even an enum? Also Go does not provide an absolute guarantee that it won't stop the world for more than milliseconds, which is not ideal for full node which must respond to transaction requests within milliseconds. Also I had linked for you upthread to the blog essay written by Mike Hearn (a former key Bitcoin developer) pointing that Go had not introduced any new research for it's GC and thus the only way it achieves millisecond pause times is by sacrificing overall performance and memory consumption (and the millisecond pause goal is not a hard guarantee).


So far my analysis is that TypeScript offers the best type system features I want (such as anonymous unions and type guards) assuming I don't need HKT and (more than simulated, first-order) typeclasses, and most low-level features could be simulated (at a loss in performance) on TypeScript, so that might be the most seamless way to proceed if my ultimate goal is to create Lucid. (Note the lack of operator overloading on TypeScript is an annoying handicap) Otherwise, Rust offers excellent low-level control, typeclasses (but no anonymous unions) and perhaps also HKT by now or eventually, but can't simulate the nursery concept I'm proposing so I'd have to use it's lifetimes+exclusive mutability, else making everything RC and declare unsafe everywhere (and I presume there's currently no solution for eliminating circular reference leaks?). For client side, I don't know if Rust runs on all smartphones and how many different distributables and complex platform detection logic would be need, as opposed to JavaScript/TypeScript which runs on computers with a browser installed. Scala 3 offers anonymous unions, an implicit form of typeclasses, "everything as an expression", multithreaded access to shared data structures, some native low-level data types lacking on TypeScript/JavaScript but not as complete as Rust, and the JVM GC means I could simulate the nursery (but RC isn't possible so these would just be handled by the GC). So coding in (or transpiling to) Scala 3 has some advantages over TypeScript, but for the client-side the JVM doesn't run every where (and is even actively disabled on some platforms). Also Scala 3 is only in alpha or beta and probably not suitable for real world use yet. Another disadvantage for Scala is that many libraries are in Java and unlike Emscripten -> ASM.js, there's no way to compile C/C++ libraries for Java. Also the Scala compiler had a reputation of being extremely slow and riddled with bugs (although the new DOT is apparently a more well thought out compiler design).

P.S. have you seen that for version 2.6, TypeScript added support for a strict contravariance mode, to avoid the unsoundness of its "bivariance" heuristics.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354648164
Original Date: Jan 1, 2018, 5:15 AM CST