keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

new wd-40 test #51

Open NodixBlockchain opened 3 years ago

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-315910415
Original Date: Jul 17, 2017, 6:06 PM CDT


not shared_ptr as the default over GC

inferior except prompt/deterministic reference counting smart pointer paradigm

deterministically destruct instances of a type that has a destructor¹ which is a major feature lacking from GC languages … noting that such types can not used with GC references directly or indirectly as a member of a data structure that is GC referenced

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

@shelby3 wrote:

However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls.

Has it ever been tried to combine reference counting with GC? Apparently so:

Edaqa Mortoray wrote:

Reference counting with cycle detection adds an interesting twist. I believe both D and Python do this. In order to eliminate memory loops objects are “sometimes” scanned for reference loops. This involves a lot more work than simply touching a reference count — though still not nearly as much as scanning all allocated memory. The impact of this loop detection depends heavily on how often it is scanned, and what the triggers are.

Given our single-threaded model, we do not need atomics on the reference counting and given the proposal of the OP to emphasize borrowing and RAII then we would not get the benefits of a generational GC scheme. So reference counting would give us determinism except in the case of cyclical references and then GC would kick in to break cycles. However, when we know the cyclical references are probable (e.g. any kind of non-acyclic graph such as the DOM or a doubly-linked list), then it is wasted overhead to employ reference counting, although that overhead could be very tiny (i.e. infrequent tracing) if we only want to catch unexpected cycles (i.e. as an insurance policy and we’d want to log these and consider them bugs).

I suppose we could rate-limit reference counting finalization, to prevent the rare aberrant high latency stall.

@fauigerzigerk wrote:

@rbehrends wrote:

It is well-known that the amortized cost of naive reference counting is considerably higher than that of a modern tracing garbage collector (regardless of whether you also support cycle collection). It is also well-known that non-naive RC-based approaches generally make it easier to achieve lower pause times (though cycle collection may throw a spanner in the works).

True, but I think these comparisons are often done under the assumption that memory utilisation remains fairly low. Tracing GCs react very badly to growing memory utilisation in my experience. This is of course ultimately an economic question and maybe it should be modelled that way.

@sklogic wrote:

@jules wrote:

@jules wrote:

Tracing GC is not always faster. GC time is proportional to the size of the allocated memory, reference counting time is not.

It's not about running out of memory altogether, it's about the heap filling up with dead objects while the GC doesn't yet know that they are dead. Of course things are great when they are great and the GC can keep up with the mutator. Yet failure does happen whether or not you think it should happen, hence all the efforts that I mentioned that are trying to fix the issues.

As I said, in such a case RC would behave even worse.

@david-given wrote:

Regarding latency spikes: that's not entirely true. If you drop the last reference to an object, then freeing that object can cause the last reference to other objects to be dropped, etc.

So it's possible that dropping a reference can cause a large amount of work, which if it happens during a critical path can cause a latency spike. Particularly if finalisers are involved. If your object ownership graph is straightforward this isn't a problem, but if you have large objects with shared ownership it's very easy to be surprised here.

@sklogic wrote:

It is really hard to get real time guarantees with RC - given that time for cleaning up depends on a size of a structure.

Just imagine it freeing a 10Gb linked list.

Real time pure mark and sweep GCs are easier.

@rbehrends wrote:

Just keep in mind the trade-offs that you are making.

For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.

@barrkel wrote:

There's no discussion of thread safety and how that affects reference counting. That's probably the biggest problem; you need to use memory barriers to make reference counting safe with multiple threads, and that can kill performance.

@vardump wrote:

Reference counting is fast with one CPU core. But it gets worse and worse as you add cores. Inter-CPU synchronization is slow and the buses can saturate. With enough cores, at some point that'll be all the system is doing

@iofj wrote:

@rbehrends wrote:

It is well-known that…

I submit this is only true for equivalent number of things to keep track of. In practice this is not the case. Languages with GC [typically] go completely overboard with GC, using it completely everywhere, even when not strictly necessary and certainly java does.

In C++, if you have a map with items, you use move semantics and you have have either 0 or 1 refcounts to keep track of, the one for the map itself. The rest is still "refcounted" but without ever touching any integer, by the normal scoping rules. Same goes for Rust code. That's ignoring the fact that a Java object is, at minimum, 11 bytes larger than a C++ object. Given java's boxing rules and string encoding, the difference in object sizes becomes bigger with bigger objects. Because out-of-stackframe RTTI is a basic necessity of tracing garbage collectors this is unavoidable, and cannot be fixed in another language. Bigger object sizes also mean more memory needed, more memory bandwidth needed, ... And Java's constant safety checks also mean

In Java, the same map will give the GC 3 items to keep track of (minimum) per entry in the map, plus half a dozen for the map itself. One for the object that keeps the key and the value, one of the key and one for the value. That's assuming both key and value are boxed primitives, not actual java objects. In that case, it'll be more.

@majke wrote:

Having that in mind, there are couple of problems with refcnt:

  • Cache spill. Increasing/decreasing a counter values in random places in memory kills cache. I remember one time I separated the "string object" data structure in python away from the immutable (!) string value (python has usually C struct object followed in memory by value blob). The python program suddenly went way faster - immutable (readonly) string values in memory were close to each other, mutated refcnts were close to another. Everyone was happier

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages. If the number of available unused bits is insufficient to count all allocated objects, then perhaps each compiled object type could employ a separate array (i.e. a zero-runtime-cost compile-time array selection), presuming the cache thrashing issue is due to intra-page-size fragmentation, not lack of inter-page contiguity. Also then we do not need to create the array element until the second reference. This is applicable when mostly passing around objects and not accessing all the data of the object as frequently.

Weighted reference counting can be employed to reduce the accesses to the reference count by half (since the source pointer has to be accessed any way), but it requires some unused bits in the pointer to store the weight.


David Ireland wrote:

Don’t get me wrong, Apple must have had a good reason to choose reference counting for memory management. That reason is actually pretty obvious: Objective-C is not a safe language, so you can’t move objects. If you can’t move objects, You have to use something like malloc to minimise fragmentation: you pay a big up front cost to allocate an object in the right place. In most GC schemes, allocation is free – you just increment a pointer. This is fine, because you are going to iterate through all your objects every now and again, and it doesn’t cost a whole lot more to move some of them around at the same time. You get to compact away all the fragmentation this results in. This makes GC fairly miserly with memory too. In iOS , if you want to allocate something bigger than the largest free fragment, you die. With GC, you have to wait a while for the GC to compact everything, after which memory will be compacted 100% efficiently and there will be a single memory extent as big as all available memory. Objective-C can’t have any of these benefits, so Apple chose a different route for iOS, and put the best possible spin on it.

Mobile apps won’t be fast until browsers are re-written in a fast, safe programming language that can share a good GC with ECMAScript. Of course browsers won’t be secure until then either, but I’m not holding my breath. There is no such language that is sufficiently popular, and Google fluffed it’s lines with Go. Maybe ECMAScript implementations will gain so much in performance, that all code that manipulates the DOM will be written in ECMAScript, and we can finally use a high performance GC.


Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-315958730
Original Date: Jul 18, 2017, 12:02 AM CDT


@anonymous wrote in private:

Progress on the language design is great to see.

I need progress on code, not language design, but I am trying to have paradigm I can be thinking about when structuring my code in TypeScript + C, so that it can be an easy transition if ever we complete a new PL and also to drive my insights on that potential language.

Concretely, I think it means I will write wrapper classes with getters and setters in TypeScript that take an ArrayBuffer for data and do my malloc in Emscripten and eliminate marshalling that way. Which will be a precursor to what a compiler can do automatically (and later with better integration by outputting ASM.js or WASM directly and maybe one day our own VM since the browser VMs have limitations).

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-325094328
Original Date: Aug 26, 2017, 1:40 AM CDT


Rust’s leadership may be falling into the political hell hole.


Look what Eric wrote about Rust:

http://esr.ibiblio.org/?p=7711&cpage=1#comment-1913931
http://esr.ibiblio.org/?p=7724#comment-1916909
http://esr.ibiblio.org/?p=7303
http://esr.ibiblio.org/?p=7294
http://esr.ibiblio.org/?p=7294#comment-1797560
http://esr.ibiblio.org/?p=7294#comment-1797743
http://esr.ibiblio.org/?p=7711&cpage=1#comment-1911853

They also argue (similar to a point I made to @keean) that type system complexity can kill the economics of a language (this complexity budget has to be minimized):

http://esr.ibiblio.org/?p=7724#comment-1912884

And everyone agrees OOP (virtual inheritance and subtyping) are anti-patterns:

http://esr.ibiblio.org/?p=7724#comment-1912640

Comments about where Rust has overdrawn it’s complexity budget:

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-345494394
Original Date: Nov 18, 2017, 11:58 PM CST


I wrote:

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

[…]

Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

Re-reading the prior linked comments…

First realization is that reference counting is non-deterministic and is inferior to GC in every facet (for the applicable use cases of non-deterministic resource management and thus the inherent lack of deterministic finalization) except for interop with non-GC objects as explained below (but potentially at the cost of loss of compaction and loss of automatic cyclical references memory leak detection).

Afaics, a useful facility to add in addition to GC is the zero-cost1 abstraction of deterministic RAII block scope allocated/deallocated on the stack, because it (probably improves performance and) adds determinism of finalization (aka destructors) and interacts with the optimization of low-level performance as explained below.

To achieve such zero-cost resource management abstractions requires typing/tracking the lifetimes of borrows of references so the compiler can prove a resource has no live borrow when it goes out of block scope. I had proposed the compile-time enforced “is stored” annotation on function inputs when an input will be (even transitively) borrowed beyond the lifetime of the function application. A separate concern from the issue of tracking borrows (a la Rust’s total ordering which assumes multithreaded preemption everywhere or my proposed partial ordering in conjunction with singled-threaded event model) for the purpose of preventing concurrent race conditions access (which afair we discussed in depth in the Concurrency thread). This granular typing of the context of side-effects seems more straightforward than some grand concept of typing everything as a (a cordoned context of imperative) side-effect??2

One of the key advantages/requirements of GC is the ability to move objects so that memory can be periodically compacted (with the compaction being amortized over many implicit/delayed deallocation events which would otherwise be explicit/immediate/non-deterministic-latency with reference counting), which enables allocation to be as low-cost as incrementing a pointer to the end of allocated objects in a contiguous memory address space.

As an aside, for highly optimized low-level code to run at maximum performance, it must be possible to freeze the position of objects in the virtual memory address space, else the objects must be copied. The former has a concurrency live and dead-lock issue w.r.t. the said compaction.

For RAII managed data structures that contain pointers to data (i.e. unboxed) which is not RAII managed at the current or containing block scope (i.e. for GC pointers since pointers to data RAII managed in a contained block scope would exhibit use-after-free errors), these pointers have to be mutable by the compaction engine and have to be recorded in the list of references which the GC must trace. To have full interoperability between GC instances and RAII (stack allocated data structure) instances without copying data, incurs either a performance penalty of reloading pointers on each access (in case they’ve been moved by the compactor) else the aforementioned concurrency live and dead-lock issue w.r.t. the said compaction (which is an onerous, undesirable can-of-worms).

Thus, I’m leaning towards that stack allocated data structures should not interop with GC instances, except via copying because (other than deterministic finalization) performance is their primary raison d'être. And interop with GC would (as GC does) also encourage unboxed data structures, which are known to thrash caches and degrade performance. Additionally, I think it should be allowed to have GC instances which are unboxed data structures (e.g. transpiled to ArrayBuffer in JavaScript) so they can be most efficiently copied (i.e. no transformation overhead). One might ponder whether to also have reference counted allocation on the heap , so that these immovable objects would not need to be copied when accessed by low-level performant algorithms (and note that the reason ASM.js didn’t allow very small ArrayBuffers is because there’s no presumption that what was compiled to ASM.js did bounds checking, although not a hindrance because our compiler could do static bounds checking in many cases and insert dynamic checks in the others), but the loss of compaction and the lack of automatic cyclical references detection are onerous. Any attempt to overcome these onerous trade-offs with “stop the world” of the code that is operating on non-GC instances, would suffer from “a concurrency live and dead-lock issue” as well (because think about it that just stopping the threads is not sufficient as the state can be invalid when the threads are restarted, instead they must be stopped at points where the current thread state has no such dependencies).

However, is memory compaction even necessary on 64-bit virtual address spaces wherein the hardware can map the virtual address pages to compacted, available physical pages? Well I guess yes unless most objects are much larger than the ~4Kb page size.

However, afaics perhaps the compaction issue could be significantly mitigated by allocating same size objects from contiguous virtual address space significantly larger than ~4KB in reference counting algorithm. Thus allocations would be first applied to deallocated slots in the address space, if any, per the standard reference counting allocation and deallocation algorithm, which employs a linked-list of LIFO free slots.3

If compaction is required, then I contemplate whether to allow unsafe low-level code which manually employs malloc and free should still be allowed for cases where there’s no other way to achieve performance with dynamic memory allocation.

1 Zero runtime cost because due to the compile-time determinism, we know at compile-time when the deallocation will occur.

2 How does Oleg’s insight conflict with @keean’s prior stance (in our discussions) against AST macros? I vaguely remember that he was against macros because they obfuscated debugging and the type system.

Is this the paper? http://okmij.org/ftp/Computation/having-effect.html#HOPE

I’m unable to readily grok Oleg’s work because he communicates in type theory and Haskell (neither of which am I highly fluent). Afaics he explains with details/examples instead of conceptually and generatively from ground level concepts. And I do not have enough time to apply arduous effort. So he limits his audience to those who are fluent in every facet of the prior art he builds on and/or who can justify the time and effort.

Essentially all side-effects distill to the generative essence of shared state. If state is not shared, then there are no interactions of effects between those processes.

Afaics, a Monad controls the threading of access to an orthogonal context across function application. That orthogonal context can be state in the State monad. Correct? That is why Monads can’t generally compose because they cordon orthogonal contexts.

But what is the goal to achieve? If we model higher-level concepts as imperative building blocks with contexts, we can do unlimited paradigms, but not only will our code become impossible to read because of the overhead of the necessary syntax and type system to allow for such generality (which was essentially my complaint against trying to model everything, e.g. even subtyping, as typeclasses as you were proposing before), but we have an explosion of issues with the interaction of too many paradigms. Afaics, this isn’t headed towards a simple language.

There will not be a perfect language. Type theory will never be complete and consistent. Design is driven by goals.

Thoughts?

3 Richard Jones. Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-345981613
Original Date: Nov 21, 2017, 4:20 AM CST


If i may :)

For me the main problem i would have with garbage collector is that they are harder to debug and test.

The nightmare i have is you will test the functionality on simple program, then it works, and you keep adding functions, race conditions, and two month latter you realize there is a memory leak or memory corruption somewhere, and you're good for days or weeks of debugging to trace the problem.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

With garbage collector who will parse the whole tree every now and then, it will traverse thousands of pointer allocated over few minutes of running, and it can become much harder to track where the problem is if it free a memory that it shouldn't, or if it doesn't free a memory that it should.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time, and the host program might have no clue at all of what the plugin is doing with the reference passed to it. And it's the same problem with distributed application.

For me now when i want to add new feature, especially for memory management, my first concern is not only 'how to make it work', but also how easy it will be to detect problems, and to track what cause the problem.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

For me to make GC programming easy, it needs something like android SDK where all the application is structured at macro level with activities, intents, special class for asynchronous/background execution, and making in sort that lifetime of all objects are predictible due to the structuring of the application, declaration of resources in XML etc.

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Making general purpose GC to deal with complex language with multi layered scope and asynchronous execution without giving any hint to the pattern that the object is likely to follow by inheriting some class, and structuring the application with built class with predictible behavior seems very hard.

Reference counting is much easier to deal with for the general case.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346013299
Original Date: Nov 21, 2017, 6:34 AM CST


My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have proceedures not functions then to return some data you simply pass a collection (set/map/array etc) into the proceedure.

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346107470
Original Date: Nov 21, 2017, 11:51 AM CST


@NodixBlockchain (aka IadixDev @ BCT) wrote:

If i may :)

I can’t speak for @keean (and this is his project area but this is my thread), but personally I’d prefer “please don’t” (or post once for every 10 posts by others in this thread, so as to keep the S/N ratio of the thread higher), i.e that you would comment in your thread “Saluations! :)” instead (or at least not in this/my thread), unless you have something new+important+accurate+well-researched to point out which you have not written already in the past.

If you quote from this thread, and post in your thread, I will probably read (presuming I have time) and perhaps even reply there. I’m trying to avoid this thread becoming another 300+ posts that I can’t wade through when I want to refresh my own memory of what my (and @keean’s) analyses was/is.

@keean and I have been doing backtalking on LinkedIn to reduce clutter in these issues thread. You’re welcome to add me there.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

You can’t see which/where are all the references to the shared object that had it’s reference count decremented, because it’s the implicit nature of non-deterministic memory allocation to not know which pointers would be the candidates and the reference counting scheme provides no back pointers from the said object to said sharing pointers (as aforementioned).

I agree it wouldn’t normally be possible to set a runtime breakpoint on the change in number of references to a specific object with a tracing GC, because the assignment of pointers doesn’t touch the referenced object (although write-guards for generational GC might get you as close as the virtual page but that might not be helpful). Thus, there’s no way to know which assignment to set a breakpoint on for testing the address of the said shared object. Yet comparably, the more optimized variants of reference counting also have the same debugging problem because they don’t even update the said common object (as explained below) because they keep the reference count in the pointers (and additionally there is the Weighted Reference Counting). In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time

It seems you’re presuming that reference counting models deterministic memory allocation, but that is not generally the case and (per my reply to @keean below), if that were generally the case then in theory it could be replaced by zero-runtime-cost compile-time deterministic allocation mgmt (albeit with tsuris of a complex compile-time typing system). Afaics, you continue to repeat this belief system of yours (presumably confirmation biased by your presumed preference for your low-level framework you discussed in your thread “Saluations! :)” and per my comments to @keean about C/C++ below) without refuting the points that have been made to you numerous times previously. I’ll recap and augment… (and hoping you’ll keep the not-so-thoroughly-contemplated-studied noise to a minimum in this/my thread please).

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

I don’t see how the process lifecycle has anything to with the generalized problem that in general memory allocation is non-deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Programming paradigms for avoiding memory leaks are useful regardless of the memory allocation scheme chosen.

I previously mentioned the following disadvantages for reference counting to you (either in the Concurrency thread and/or the other thread you created). And note the first one below is a form of memory leak inherent in reference counting.

Reference counting (aka ARC, which technically is a form of “direct”1 garbage collection) can’t garbage collect cyclical references. And according to an expert book I’m reading there’s no known means of weak pointers or other paradigm which can reliably solve the problem of cyclical references in all scenarios.2 This makes sense because reference counting (RC) doesn’t contemplate the entire directed graph of objects holistically. Thus only “indirect, tracing”1 garbage collection which deals with the entire directed graph will reliably detect and garbage collect cyclical references. However, note that probabilistic, localized tracing combined with RC decrements is comparable to mark-sweep (MS)3 for (even apparently asymptotic) throughput computational complexity costs, yet with an upper bound (“6 ms”) on pause times due to locality of search. Yet presumably pause times are eliminated with a concurrent MS (and as cited below, RC has less throughput than BG-MS).

According to this expert book4, reference counting has significantly worse throughput (except compared to the pathological asymptotic huge heap case for tracing GC which can be avoided15, and perhaps multi-threaded/multi-core/parallelism but I think need to study that more) because of the significant additional overhead for every pointer assignment. This overhead as a percentage is worse for highly optimized, low-level languages (e.g. C/C++) compared to interpreted languages which have high overhead any way. Note that a claimed throughput parity for RC with MS5 (and it’s just culling young generation overhead similar to prior art16) is not parity with generational GC combined with MS (BG-MS).6 Additionally the adjustment of the reference counts on each pointer assignment thrashes the cache7 further reducing performance, except for optimizations which generally have to combined with a tracing GC any way (and they diminish the afaics mostly irrelevant claimed deterministic finality benefit).8 Additionally for multithreaded code, there is additional overhead due to contention over the race condition when updating reference counts, although there are potential amortization optimizations which are also potentially superior for parallelized and/or distributed paradigms9 (but they diminish the afaics mostly irrelevant claimed deterministic finality benefit). Additionally memory allocation management on the heap is non-deterministic, so the determinism of the immediacy of deallocation for referencing counting is afaics more or less irrelevant. Additionally reference counting (especially when executing destructors) can cause a dominoes cascade of latency (aka “stop the world”) and the amortization optimizations10 again diminish the afaics mostly irrelevant claimed deterministic finality benefit.

However, in my next post in this thread, I will propose a variant of deferred reference counting11 in conjunction with a clever generational GC scheme (apparently similar to prior art16), for the avoiding the pathological asymptotic case of tracing GC. See my further comments about this in my reply to @keean below.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

In general, more complexity in the code will increase the probability of semantic memory leaks, even semantic leaks of the form that automatic garbage collection can’t fix because the error is semantic. Reference counting (a form of automated GC) will also suffer these semantic memory leaks. The issue of complexity of semantics, is more or less not peculiar to the type of automatic GC scheme chosen. Thus I don’t view this as a valid complaint against tracing GC in favor of reference counting.

1 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §1 Introduction, pg. 2.

2 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.5 Cyclic reference counting, pg. 56 – 62, §3.6 Issues to consider: Cyclic data structures, pg. 71.

3 D. F. Bacon and V. T. Rajan (2001). Concurrent cycle collection in reference counted systems. In J. L. Knudsen, editor, Proceedings of 15th European Conference on Object-Oriented Programming, ECOOP 2001, Budapest, Hungary, June 18-22, volume 2072 of Lecture Notes in Computer Science, pp. 207–235. Springer-Verlag, 2001.

4 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm, pg. 19 – 25, §3 Reference Counting, pp 43 – 74.

5 R. Shahriyar, S. M. Blackburn, X. Yang, and K. S. McKinley (2012), "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013. Originally found on Steve Blackburn’s website.

6 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003, Table 3 on pg. 9.

7 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm: The strengths and weakness of reference counting, pg. 22.

8 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.3 Limited-field reference counts, pp. 50 – 55, §3.3 Limited-field reference counts: One-bit reference counts, pg. 52, §3.6 Issues to consider: Space for reference counts & Locality of references, pg. 71.

9 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §8.4 Concurrent Reference Counting, pp. 200 – 201, §3.7 Notes, pg. 74, §4.1 Comparisons with reference counting, pg. 77.

10 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.1 Non-recursive freeing, pg. 44.

11 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting: The Deutsch-Bobrow algorithm, pg. 46.


@keean wrote:

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have procedures not functions then to return some data you simply pass a collection (set/map/array etc) into the procedure.

This is conceptually broadly analogous/related to deferred reference counting12 schemes which rely on compile-time deterministic tracing of the deallocation, such as via linear (aka uniqueness) types.

Although I was obviously also contemplating similarly to you in my prior posts in this thread (after you had reminded me about RAII in our previous discussions and my recollection of programming in C++ in the late 1990s), after reading a book on garbage collection and thinking more about the issues, I’m going to further argue against the compile-time deterministic memory mgmt language design strategy in my upcoming detailed post, because I believe it’s not as generally flexible, generally sound, nor highly comprehensible solution.

These stack memory mgmt paradigms you’re referring to, have narrow applicability because they don’t handle the general case of non-deterministic heap management.

Afaics, it’s a special case methodology that leads to a very complex set of corner cases such as for C/C++ which tries to pile on a bunch of different sort of compile-time optimizations and keep as much memory mgmt on the stack, but lead to a brittle clusterfuck of a language with a 1500 page manual that virtually no one understands entirely (not even the C++ creator Bjarne Stroustrup nor the STL creator Alexander Stepanov, as they both admitted).

And those stack paradigms don’t marry well holistically with a bolt-on tracing garbage collection appendage13 (i.e. was not design holistically with the language), although I have not yet read a more recent research paper on combining tracing GC with Rust.14 To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost of 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”

Additionally, perhaps my GC design ideas (to be outlined in my next post) might possibly be compelling (if I haven’t made some errors in my thought process, which is entirely possible given I need to eat dark chocolate to even get my brain to ephemerally coax my brain out of the 105 IQ brain fog gutter and function non-lethargically presumably due to an ongoing TB infection). Note however, that the prior art for GC is apparently vast and seemingly exhaustive in terms of the low hanging fruit of obvious, simple variants16 (i.e. fertile ground presumably only in the realm of esoteric, high complexity, and/or abstruse). Thus, unlikely I discovered something (after a day of reading and contemplation) that hadn’t be considered previously in the prior art, although there appears to be potential prior art similar to my ideas16 that have appeared later than the book I read. Note I independently arrived at my ideas (after about a day of contemplating the 1996 book) and have not yet read that 2003 research paper16 to compare.

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

My thought is we need to be circumspect about the benefits versus trade-offs of every paradigm and feature we choose in a programming language. And the target use cases (and acumen/preferences of the target programmers demographics) of the programming language will also factor into the decision process.

For example, Haskell programmers favor brevity and mathematical expression with trade-offs such as hidden (2nd, 3rd, and 4th order) details in the form of implicit typing (which the mathematically minded model in their mind without reading them explicitly in the code), which drastically reduces the demographics of programmers who can read the code and participate. Some snobbishly claim this is a good filter to keep the less intelligent programmers away. Haskell has other trade-offs (necessarily to enable the mathematical modelling) which we’ve discussed else where are outside the scope of this concise summary, including but not limited to (and my descriptions may not be entirely accurate given my limited Haskell fluency) the debugging non-determinism tsuris of lazy evaluation (although the requirement for lazy evaluation is disputed), the bottom type populating all types, a total ordering (or algebra) on implementation of each data type for each typeclass interface, a total ordering on the sequential (monadic) threading of I/O processing in the code (requiring unsafe code to fully accommodate the non-determinism of I/O), etc..

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Human managed weak references are an incomplete and human-error prone strategy. Also as you noted then you leak the non-determinism from deallocation to how to handle error conditions of null pointers, leaking the non-determinism into program logic! That is horrible. Whereas, a provably sound weak pointer algorithm has exponential asymptotics and thus is no better or worse than tracing GC mark-sweep.2

12 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting, pg. 45.

13 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §9 Garbage Collection for C, pg. 227.

14 Y. Lin, S. M. Blackburn, A. L. Hosking, and M. Norrish (2016), "Rust as a Language for High Performance GC Implementation," in Proceedings of the Sixteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘16, Santa Barbara, CA, June 13, 2016, 2016.

15 Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, Preface: The audience, pg. xxiv, Preface: The Bibliography and the World-Wide Web, pg. xxv.

16 Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003. I originally found this on Wikipedia’s page for Reference Counting.


Kindle version of the book is available on Amazon for less then $20 if you don’t want to turn your monitor sideways to read the “free” (copyright violation theft) pdf linked above.

EDIT: a more concise overview resource:

@NodixBlockchain wrote:

https://openresearch-repository.anu.edu.au/bitstream/1885/9053/1/02WholeThesis_Garner.pdf

This papper cover it all :D

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346114141
Original Date: Nov 21, 2017, 12:14 PM CST


The claims made about GC efficiency in those kind of books never turn out to be true in reality. Look at how JavaScript is slow yet emscripten can get within factor of 2 of native 'C', yet both run on the same VM. The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346116255
Original Date: Nov 21, 2017, 12:21 PM CST


Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

I wonder if this solution would work if the object instance referenced can change during the lifetime of the reference.

Because it would mean you might need to delete the referenced object before the reference get out of scope in case a new object is assigned to it during its lifetime.

NodixBlockchain commented 3 years ago

Original Author: @barrkel
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346119442
Original Date: Nov 21, 2017, 12:33 PM CST


NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346122168
Original Date: Nov 21, 2017, 12:43 PM CST


TypeScript can force you to stick to classes for objects, allowing the JIT to efficiently optimise these. Really for JS where there are no allocations going on it is as fast as asm.js. GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

What is the fastest garbage collected language?

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346126086
Original Date: Nov 21, 2017, 12:56 PM CST


GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346126913
Original Date: Nov 21, 2017, 12:59 PM CST


The cache issue for me i don't think it's a problem as pointer and reference count are supposed to always be in the same cache line

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346128500
Original Date: Nov 21, 2017, 1:05 PM CST


See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management

The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application.

The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-)

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346131548
Original Date: Nov 21, 2017, 1:15 PM CST


It cannot be with the pointer, because when you create a new pointer to the array you need to increment the reference count.

NodixBlockchain commented 3 years ago

Original Author: @barrkel
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346132506
Original Date: Nov 21, 2017, 1:19 PM CST


NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346135291
Original Date: Nov 21, 2017, 1:29 PM CST


I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

Actually the difference in performance between JS and WebASM is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346138363
Original Date: Nov 21, 2017, 1:40 PM CST


Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346186551
Original Date: Nov 21, 2017, 4:47 PM CST


@NodixBlockchain wrote:

What make me think that cyclic dependency are not a problem with the way i handle it, is that i have a case of cyclic dependency like this.

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

I understand you want to participate. But you really have not wrapped your mind around all the concepts, and so you should be more circumspect and lurk more. I said if you want to post to your thread, then I will try to participate occasionally.

Please be respectful and recognize your limitations.

I'm aware of the potential downside, but for me they remain benign for the use case i have for the moment.

After briefly learning about it, I’m no longer interested in your low-level framework. Your monotheistic intent on pushing your low-level preferences diverges the focus of my thread, because you don’t have an open-mind attempting to be objective about research results. Your low-level framework focus is irrelevant to the topic of this thread. Please talk about that in your thread which you entitled “Saluations! :)”. This/my thread is not for trying to convince yourself and/or the world that your low-level framework paradigm is great. Your thread is for that. I wouldn’t ban you even if this were my repository, but I believe curation (not absolute censorship) is valuable. Trust me, if you write anything in your thread which I happen to read, that I think is important enough to be in this thread, I will quote it over in this thread. Also I presented a guideline given that I find your posts have so many errors and low informational value, that you could post after every 10 posts by others in the thread (so as to force you to focus yourself more and keep the S/N ratio here reasonable).

I considered the low-level implications as even more evident by the post which follows this one. And realized that your preferences are a dinosaur. See my next post.

I am discussing here about general purpose, high-level programming language design that employs compiler checking and/or automated runtime GC to facilitate lower costs of development and guard against human error. With some thought given to performance optimization and some perhaps C-like low-level capabilities as long as the result is clean, simple, and elegant and not explicit, manual, bugs-prone human mgmt of memory.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346199127
Original Date: Nov 21, 2017, 5:53 PM CST


@keean wrote:

The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

TypeScript transpiles to JavaScript (ES6), not ASM.js nor WASM.

(As you know, TypeScript aims to be round-trip compatible with and a typed superset of JavaScript— meaning it should pass through ES6 code more or less unchanged.)

Also C code which is transpiled to ASM.js/WASM, is employing different abstractions such as not boxing every damn thing, such as every element of every array.

The VM can’t always statically type objects, so methods and type coersions can require dynamic overhead.

Also I believe JS may be passing all function arguments on the heap (the prototype chain of objects which form the closures we previously discussed), and thus although the throughput and locality (i.e. contiguity for minimizing cache and virtual paging thrashing) is similar (i.e. allocation contiguously on the stack versus the generational heap with only an extra end of area pointer increment for each generational area allocation as compared to one for the SP on each function call), the volume of objects before a collection (i.e. reduction of the end of area pointer, although presumably all function arguments could be allocated as one data structure thus equivalent cost) is much greater than for stack passed arguments and thus perhaps additional thrashing.

C code rarely employs hashmap lookup, yet JS literally encourages it since all object properties and non-numeric array indices employ it.

There are many more details and we would need to study all the reasons:

https://news.ycombinator.com/item?id=7943303
https://www.html5rocks.com/en/tutorials/speed/v8/
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e

This thread is in part my attempt to get the best of low-level ASM.js/WASM married to a better GC scheme that is more compatible with (some of) those lower-level abstractions!

[…] GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

Agreed, e.g. do not realize they’re boxing everything forcing an extra pointer indirection for every read/write of an array element for example.

Actually the difference in performance between JS and WebASM[ASM.js] is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Huh? JS is several times slower than C, and ASM.js is afair purportedly within about 40% slower than native C speed:

https://julialang.org/benchmarks/
https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=node&lang2=gpp

Perhaps you’re referring to JS highly optimized to avoid features which impair aforelinked V8’s optimization?

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research1 (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

Yes GC can be fast providing you have twice the RAM available.

Even manual memory mgmt will have virtual memory paging out to disk as the physical memory is exhausted.

That half memory deal is afaik due to the “semi-space” of a copy collector that is not generational. Afaics, there are better alternatives now. See footnote16 in my prior post. I need to study this more thoroughly though.

Of course manual (both unchecked and compiler checked) mgmt can be a bit faster or correct some pathological cases (as well create potentially other ones ;), but at great cost as I wrote in my prior post as follows:

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

And you alluded to the great cost in terms of bugs and/or tsuris:

using a system with manual allocation - providing the software does not crash due to bad memory management :-)

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs (unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation):

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

1 T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.


@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Cited:

Hertz and Berger [2005] explore the tradeoff between automatic and manual memory management in an artificial setting, using a memory trace from a previous run of the benchmark to determine when objects become unreachable and inserting explicit calls to free objects at the moment they become unreachable. This analysis, while interesting in itself, ignores the enormous programming effort and overhead (as in talloc above) required to track the ownership of objects and determine when they are no longer used. The most frequently cited study is Rovner [1985], who found that Mesa programmers spent 40% of their time implementing memory management procedures and finding errors related to explicit storage management.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

I'm gonna unsubscribe from this thread now.

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I appreciate your assistance in helping to explain. And I agree that is an important clarification to emphasize. Please feel free to participate here as often or infrequently as you want.

Btw, @keean is expert in other areas such as C/C++, Haskell, Prolog, type theory, and library algebras. Afaik, he has only been developing seriously with JS/TypeScript for a relatively much shorter period of time. So that might explain it. Or it could also be rushing and trying to answer late in the evening after a long day of managing his company.

@barrkel wrote:

Types are part of it, but it's more about
abstraction level. Higher abstraction means the runtime needs to perform
more work to get to machine code, and will be using generalisations (aka
baked in assumptions about trade-offs) to get there - and on a JIT time
budget too. Whereas webasm is designed to be efficiently and unambiguously
converted to machine code, so there will be fewer trade-offs, and the
compiler targeting webasm can spend much more time analysing the trade-offs
involved in reducing abstraction level.

[…}

And that's without getting into the costs of semantic fidelity. You can't
simply pretend JS's prototype inheritance doesn't exist given typescript
source; all the work needs to be done anyhow, just in case, since the web
is a distributed system and thus non-deterministic.

+1.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346301603
Original Date: Nov 22, 2017, 4:01 AM CST


I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

The script system is actually close to javascript with tree of objects on several layers so i could make generational GC very easily, but there is too much corner case with multi threads and plugin for me to really consider it, over simple reference counting who works for every single case possible with very small code base.

Been doing tons of experiment with different model of memory management.

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

And it's not even multi threaded, and there are already these kind of problem that are recurrent in every GC language out there.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346319285
Original Date: Nov 22, 2017, 5:09 AM CST


I’m trying to preempt another 1000+ post thread. We already had extensive discussions of your ideas, framework, perspective in the Concurrency and your “Saluations! :)” threads. We don’t need to repeat that again. I’m not trying to be dictatorial or unappreciative of discussion. I’m very interested right now in automated GC, for the reasons I’m explaining. I want to see if we can make it work optimally with the lower-level abstractions and resolve the pathological asymptotics issue.

@NodixBlockchain wrote:

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

I already told you that explicitly asking the programmer to remember to break cycles is off-topic (i.e. the destructor will not be automatically fired if there is a cyclic reference preventing the parent most object's reference count from going to 0):

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

Although you’ve deleted (or moved to your thread?) those numerous prior off-topic posts I asked you to (thank you), you want to repeat the same points of what I percieve to be off-topic noise w.r.t. to the focus I want in this thread. I understand you think you have something important to say, but you apparently did not even pay attention or grok that I had already stated that it (i.e. the same point you’re repeating again) was/is off-topic.

@NodixBlockchain wrote:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading everywhere is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

@NodixBlockchain wrote:

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

How many times am I going to have to repeat myself that I am not interested in discussing further the anecdotal claims of your low-level framework. You’re apparently overlooking the more holistic point that:

  • We’re (or at least I’m) not interested in your manual “unsafe” (i.e. not checked by compiler not runtime enforced) paradigm. Been there, done that in the 1980s and early 1990s. In late 1990s I adopted simplistic reference counting with C++ for CoolPage. Now I’m advancing further. IMO, you’re a dinosaur.

  • My reasons for not being interested are detailed in my prior response to @keean and @barrkel.

    You’re preferences and experience with your framework are not indicative of the development costs and other trade-offs for the community-at-large with such a paradigm, i.e. your arguments are anecdotal. When you have a large community and a lot of research/surveys/metrics amongst many general purpose applications refuting the exposition I have made based on more extensive sources, then we can discuss yours. No doubt that if you’re personally very dedicated and expert in hand-tuning with your framework specifically designed for the blockchain application you’re building, then you can probably get outstanding metrics as an anecdotal example application, but that isn’t provably (and IMO unlikely to be) relevant in the larger scheme of things w.r.t. to general applications and general developer productivity, maintenance, stability, bugginess, etc.. Again please refer to my prior response to @keean and @barrkel for my reasoning on why manual, human-error prone management of resource factors is a computer science cul-de-sac. Whether my perspective is correct or not, afaics it’s the current focus of this thread. Even the owner of this repository @keean who expressed some preference for some compile-time deterministic memory mgmt, afaics doesn’t want to allow cyclic references memory leaks that require human intervention, as you’re apparently proposing.

    Additionally having your own methodology for your own handcrafted framework is not a valid comparison to what is possible but has not yet been so handcrafted for tracing GC (so as to be able to compare apples-to-apples for a diverse ecosystem of users and use cases), as I had already explained in a prior response as follows:

    In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

If you’re interested in your paradigm in spite of what I perceive to be the convincing exposition in my prior posts, then please feel free of course to discuss in a different thread. I may occasionally read there. If you make your best arguments there (not in this thread) and if I see anything I agree with, then I will quote it over here (and invite/entice you to discuss here). Please do not pollute my thread (at least respect the one post for every 10 others request, or 6 if all the intervening posts are lengthy and infrequent) with what I currently perceive to be off-topic discussion. I do not want an interactive chat style discussion in this thread. We can chat back-and-forth realtime in private or another thread. I wanted focused, carefully thought out discussion in this thread that is easy to re-read and condense enough for others to follow. Because this is my thread wherein I am trying to sort of what language paradigms I need and this is a very serious issue I am in a rush to complete and I do not want the distraction of what I perceive to be a boisterous (repetitively arguing the same points over and over) dinosaur and entirely incorrect.

Essentially you’re trying to route me away from extensive sources to your pet project. That is very myopic.

You’re apparently a prolific and very productive programmer (unlike my pitiful self reduced to rubble because still apparently suffering from a perhaps resistant strain of Tuberculosis). You and I aren’t able to collaborate much because you and I are ostensibly headed in different directions w.r.t. programming paradigms, while maintaining very low overhead on throughput.

@NodixBlockchain wrote:

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

You’re referring to automated GC implementations from a decade or more ago. Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

I wrote already in this thread that new research has drastically improved this issue:

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

You make me repeat myself because you don’t assimilate all the details I write. Presumably you’re so glued to your mental model of programming that you don’t notice subtleties that paradigm-shift the presumptions.


@keean wrote:

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Agreed, but note in my prior post with the many footnotes about GC, I cited some research (Weighted Reference Counting) that places the count in the pointers for allocation and only has to decrement the shared count on deallocation. Also cited research about employing the reference counting only for a single-bit stored in the pointer in a reference counting generation scheme. Also in an earlier post I suggested to pulling all the counts closer together in contiguous locality to minimize cache line and virtual page faults. There’s also the concept of buffering the increments and decrements to amortize.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346323307
Original Date: Nov 22, 2017, 5:27 AM CST


The thing is my paradigm is the basics for high level language, which is fully working with JS/Json, can generate binary data for webGL apps, or other kind of dynamic application.

Even Node.js or javascript VM use this kind of code internally. You can't have any kind of high level language without this kind of code underlying it.

And for the moment it give more feature than javascript as it work in multi threaded environment and memory can be freed safely when it's not used directly.

The script language can already process HTML data, including form variable, query variable, and interact 100% with js app, and i think it's still cleaner than PHP.

for example https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site#L555

It can parse POST variable, including files, generate js variable and code with live variable from the node etc.

The low level mechanism is only to be seen as a minimalistic VM to run the script language, which either handle network events, or generate html page with dynamic data.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

The lexer/VM can be very simple because the low level code already have advanced feature to deal with objects, memory, scope and evaluation of objects data.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346336143
Original Date: Nov 22, 2017, 6:27 AM CST


@NodixBlockchain, you’re repeating the same points I already discussed with you in the Concurrency and your “Saluations! :)” threads. I refuse to repeat my explanations again to you as to why what you think is applicable, is not applicable to what I am doing. It’s not my job to repeat myself over and over, just because you think I didn’t already refute you before in those other threads.

Please stop. Please.

You can't have any kind of high level language without this kind of code underlying it.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

Incorrect. And I do not want to explain why again.

And for the moment it give more feature than javascript as it work in multi threaded environment

Your paradigm for multithreading is not applicable to the direction I’m headed, as I will repeat what I wrote before quoted as follows:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading every where is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

I’m not trying to be a jerk. I have clear reasons for the focus that I’m pursuing. I hope you also understand my available time is very limited (and no that is not a valid reason that I should employ your framework).

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me. I’m having enough difficulty as it is making forward progress, and I need to be taken off on repetitive tangents like I need another hole in my head in addition to the one I already have from a hammer.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346367257
Original Date: Nov 22, 2017, 8:34 AM CST


@keean wrote:

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Agreed afaik you appear to be making a valid and important point, which dovetails with my initial peek into analyses about GC for proper integration with the low-level coding.

The problem you’re experiencing perhaps has something to do with the fact that most browsers are limited to presuming a least common denominator of a 32-bit virtual address space and this gets very complicated as to how they split this up between processes. They also have to protect against writing outside of bounds efficiently. I dug a little bit into the issues (c.f. the last 2 links in quoted text below) but did not completely climb down the rabbit hole.

@shelby3 wrote:

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages.

I had mentioned to you in private discussion my idea about storing older generation objects in same sized objects memory pages in order to minimize fragmentation with a 64-bit virtual address space. This would rely on less frequent compaction pauses for old generation objects, and would reuse old generation slots instead of waiting to compact them (i.e. not primarily a copy collector).

Also note that very large or small objects are typically handled different by GC algorithms, so there may be an issue there w.r.t. to the 16MB objects you’re allocating.

I agree that the memory allocation seems to have corner cases around the interaction of the GC in JS and the allocation of the ArrayBuffer for ASM.js/WASM. Also of course objects have to copied from one to the other, they’re not interoperable.

However, as I pointed out in prior posts, just because automated GC has a corner issue with a particular scenario, doesn’t enable the presumption that manual memory mgmt is a panacea in terms of the holistic equation of memory mgmt optimization. You gain some control in one area and lose something in other axis of the optimization space (e.g. more human-error prone, less throughput, or worse asymptotics, etc).

@shelby wrote:

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

@shelby3 wrote:

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I’m contemplating whether I can design my own programming language which models what I want, will tolerably to transpile to JS/TypeScript (or something else) easily enough to allow me to use it now, and then rewrite the VM and compiler later to optimum. This is what I’m analysing now and the reason I’m doing research on GC.

@keean wrote:

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Afaics, the optimum solution is to better design the GC and VM with the low-level coding in mind. JavaScript was designed by Brendan Eich in 10 days. I presume that since then backward compatibility demands have prevented any holistic overhaul.

Also I’m prepared to deprecate 32-bit OSes as a viable target. I’m future directed because anyway, we’re not going to get something popularized within a couple of years at best. Afaik, even mobile is moving towards 64-bit rapidly. Microsoft Windoze may be a bit resistant, but 64-bit Home is available for those who care when they buy a machine. And anyone still running that spyware without dual booting or running Linux in a VirtualBox is really behind the curve.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346379971
Original Date: Nov 22, 2017, 9:15 AM CST


Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346390097
Original Date: Nov 22, 2017, 9:47 AM CST


How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

I think GC is useful for simple coding, but there are some big problems:

  1. it encourages object thrashing
  2. it suffers from fragmentation
  3. it causes user-interface jank due to the 'stop-the-world' nature of GC.

If you have a system with all three of the above problems, there is nothing you can do as a programmer within the language to solve the problem.

You can however solve problems (2) and (3) if you can avoid (1) by limiting object creation. If you can force pre-allocation of resources, and then use effects to restrict new object creation, so that the compiler will prevent you using idioms that thrash the GC then it could be acceptable. The problem is that you need all the library APIs to respect this too, and provide a way for the user to pass the result buffer into the function/procedure. As we know if you give people multiple ways to solve a problem they will probably pick the wrong way.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346426095
Original Date: Nov 22, 2017, 11:49 AM CST


@NodixBlockchain wrote:

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

It’s half-joke and half observing that the trolling of BCT seems to follow me over here. However, it must be stated that you’re not insincere. You truly believe (or believed) your work is applicable to what I want (or to the balanced presentation for other readers/participants here). I just don’t know how much verbiage I have to put out in order to get you to understand my side. Seems like perhaps you understand now. Again I am very much willing to read anything you write in the thread you created on @keean’s repository, if I have time. But I hope you will understand my point in this thread. I suggest re-reading the thread, as I made so many edits to my posts over the past 8+ hours, including my prior responses to you in this thread.

Again thank you for cooperating on my request. That was very honorable. I’ll try to be mutually respectful in your thread.


@keean wrote:

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

That appears to ignore the arguments @barrkel and I have made about the 80/20 rule of nature. Those who ignore that rule perpetually fail in business and life. Of course you can do anything with infinite programming man-hour resources (assuming it doesn’t end up in a maintenance clusterfucked hairball), but that is not a valid argument.

That appears to be the hubris of anecdotal experience not backed by a sound analysis of the factors involved: which I will revisit below since apparently my prior writings in this thread are being ignored.

No one should dictate to you which programming language you should use. Likewise, you can’t dictate to the rest of the world, what it prefers in terms of trade-offs in costs and outcomes for choice of paradigms in the programming languages. Java, JavaScript, Python, PHP, Haskell are very popular programming languages which all use tracing GC (and apparently not even the state-of-the-art GC):

@shelby3 wrote:

Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

C/C++ are also popular, but afaik they do not match the aggregate market share of the others. As well, C/C++ these days is typically is limited to programs which require low-level control.

My quest in @keean’s repository has to been to get the advantage of GC and perhaps some other aspects of FP such as first-class functions and closures in a simple language that also can do some low-level things such as unboxed data structures. To further erode the use cases where C/C++ would be chosen, to relegate that clusterfucked C++ language tsuris (and perhaps Rust also but the jury is still out on that) more and more to the extreme fringe of use cases. For example, It is ridiculous that I must employ a cumbersome Node.js Buffer library to build an unboxed data structure.

Also I think you may be underestimating the trade-offs (at many different levels such as development time, maintenance, worst case failure modes, scalability, etc) of avoiding the assistance of runtime algorithms. Essentially since non-deterministic resource mgmt is an invariant of programming, you’ll end up writing a runtime GC algorithm to handle it, so I can’t see how your stance makes any sense whatsoever:

@shelby3 wrote:

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote16 in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research1 (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

[…]

1 T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.

@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs (unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation):

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.


@keean wrote:

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

For the highest quality software (i.e. mission critical stuff) perhaps yes (although I think perhaps algorithmically managed non-determinism will become more reliable than handcrafted), but I see already from my study of the research that gradually (even suddenly) state-of-the-art GC will continuously improve and cannibalize most of it until you’re relegated to programming languages that most people have no interest in.1 You always knew I was interested in mainstream programming languages. So if you remain glued to that stance then we're likely headed in different directions. I thought I was going to be supporting static memory allocation analysis for Lucid, until I realized that generational allocation can be as performant as compile-time stack based for the frequent young objects (and I have a new algorithm in mind to make it more so by eliminating most of the copying, i.e. “sudden” may be about to happen in a few hours).

Btw, I’m known for writing prophetic predictions. There’s some hubris back at ya. kissing_heart

I think GC is useful for simple coding, but there are some big problems:

it encourages object thrashing
it suffers from fragmentation
it causes user-interface jank due to the 'stop-the-world' nature of GC.

This has all been solved as of today. I will be writing down the details shortly.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

You mean @keean’s way. I will prove to you that is not the right way w.r.t. to this issue, although I do believe you have many important insights into programming.

1 However, I think it may be correct to allow some form of manual control over memory allocation for programmers to work around corner case issues with the automated GC.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346441236
Original Date: Nov 22, 2017, 12:48 PM CST


If we are targeting JavaScript, this is irrelevant, as we have to live with the web-browsers GC. Mozilla's work with their Rust browser has shown you get better performance by bringing the DOM into the JavaScript GC, rather than having it reference counted.

I think if you have GC the best model is to have everything managed by the GC, otherwise there seem to be more corner cases. For example mixing JavaScript and WebAsm seems to bad for memory performance.

So either we compile to web-assembly and manage the memory ourselves, or we should use the web-browsers GC. If we have any interaction with GC manages systems we are probably better off using the web-browsers GC.

However when compiling to native, it would be nice to have a static analysis so you don't need a runtime for the binary. I would really like it if a program to add two numbers consisted of a couple of loads and add and a store and nothing else.

As stated above, the benefit of GC is easy development, and freedom from some errors. Unfortunately people don't seem to realise you can still leak memory with GC, it eliminates some simple error cases, but you can still write programs that leak memory like a sieve until they crash.

In my experience it is much harder to debug memory leaks with GC than it is with manual allocation, because it is invisible to the programmer. So GC makes some easy coding easier, and some hard coding harder, but the easy cases are common and the hard ones rarer.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346452322
Original Date: Nov 22, 2017, 1:33 PM CST


@keean wrote:

If we are targeting JavaScript, this is irrelevant, as we have to live with the web-browsers GC.

I’m considering abandoning JS because of its crappy GC which doesn’t interop with low-level programming and appears to have poor asymptotics and some corner cases around fragmentation. Working on the detailed analyses now.

Originally I was advocating targeting JS because I thought it could become the mainstream VM, but there appears to be severe fundamental flaws, which is ostensibly why WASM has appeared on the scene. Java’s VM is also flawed, such as the lack of native support for unsigned data types.

I think if you have GC the best model is to have everything managed by the GC, otherwise there seem to be more corner cases. For example mixing JavaScript and WebAsm seems to bad for memory performance.

That seems to agree with what I’ve learned thus far.

So either we compile to web-assembly and manage the memory ourselves

This is the direction I’m more strongly considering now based on the aforementioned realization. But I would want to use an automated GC running on top of either WASM or perhaps native.

If we have any interaction with GC manages systems we are probably better off using the web-browsers GC.

No. JS’s GC sucks (doesn’t interop with low-level and isn’t even a state-of-the-art design). Must be replaced.

Web browsers are dinosaurs being disintermediated by apps any way. Opening the door to replace JS with WASM as the universal VM, unless WASM is poorly designed as well, in which something else might rise up. WASM apparently doesn’t support 64-bit virtual address space yet.

However when compiling to native, it would be nice to have a static analysis so you don't need a runtime for the binary.

That has no relevance in the mass market use cases I’m targeting. I’m not targeting for example the most stringent, realtime, embedded market that Rust/C++ is.

As stated above, the benefit of GC is easy development, and freedom from some errors.

And avoiding maintenance hairballs, and many other things I have already explained. You’re understating the massive, pervasive benefits.

Unfortunately people don't seem to realise you can still leak memory with GC, it eliminates some simple error cases, but you can still write programs that leak memory like a sieve until they crash.

Irrelevant. “Programming can have bugs” is not an argument.

GC addresses a fundamental and massive problem in programming, notwithstanding that it does not eliminate all programming bugs.

In my experience it is much harder to debug memory leaks with GC than it is with manual allocation, because it is invisible to the programmer.

Create better analysis tools then. Refer to the response I already made to @NodixBlockchain on that issue. I don’t think your assumption will remain true. The state-of-the-art GC algorithm employs RC for the older generation objects any way. Everything can be analysed that you do with manual now, just need the right tools.

It’s a confirmation bias to find excuses to argue for an inferior programming paradigm.

EDIT: remember we have a complexity budget. I would rather expend that static typing budget on more useful typing than wasting it on what a state-of-the-art GC can handle well. TypeScript demonstrates that a modicum of static typing can be very popular. Simplification is nearly always a win for popularity.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346458297
Original Date: Nov 22, 2017, 1:59 PM CST


I think you missed my point, I was suggesting exactly the same program could be compiled for a GC target or a statically managed target. There is a clear subset where this is easy to to (stack based RAII), and if you change target and something cannot be solved without GC then you would get a compile time error.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346482439
Original Date: Nov 22, 2017, 3:49 PM CST


@keean wrote:

GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

Quick brain dump… (very sleepy 6am no sleep yet, actually I was laying in bed about to sleep and realized I had the answer for this)

It’s understandable why you might think this, but in the context of the new BG-RC design which that 2003 paper describes and which my independently conceived ideas were very similar to and I have some additional innovations I can propose to add to theirs, I realize that afaics multithreading can be accommodated very efficiently if we are adopting the significant advantages of JavaScript’s single-threaded event queue model, i.e. each thread is more or less isolated except for read-only shared object accesses (not copying/messages as forced for WebWorkers intercommunication). (which is why I was admonishing @NodixBlockchain that his multithreading everywhere did not apply)

JS’s current BG-MS garbage collector has huge asymptotic pause times for the MS collection of the old generation objects, thus they must not allow even read-only shared references between old generations.

But with the very low pause times of BG-RC because most of the old generation is RC due to very low mutation rates, thus this is the read-only shared objects that have very low contention because they are rarely collected and collected incrementally as they occur.

The BG (generational young generation) portion runs separately in each thread and the non-RC portion of the old generation (which is a small minority of the old generation for highly mutated objects) is separate for each thread.

So all threads STW (stop-the-world) for the very low pause times and very low contention as they coordinate their very minimal coalesced updates from each thread’s BG to the shared RC.

It's all about clever amortization and segregation with a hybrid young generational tracing and old RC (with a modicum of MS for the highly mutated old) algorithm.

BG-RC changes everything. I have no idea why Java 9’s new GC1 collector is apparently not BG-RC. Perhaps constraints of the JVM (Hotspot et al) preclude it.

I will explain all of this in more detail soon. Need to get some sleep.

JS’s BG-MS is analogous to the old parallel GC for Java, although JS’s adds some incremental/concurrent improvement.

Any way, I am targeting a model which we can base a language syntax and semantics on. Probably will transpile to JS initially for MVP prototype to get rolling with. I will not be coding a BG-RC garbage collector immediately. Perfecting a GC implementation is a massive undertaking in itself.

I wrote:

@NodixBlockchain wrote:

It need to be multithread, and as lock less as possible.

I’m going to repeat what I explained you to multiple times in the Concurrency thread. Multithreaded contention is a can-of-worms in numerous facets such as requires a deterministic total ordering on borrowing (because non-deterministic locking will always have corner case live lock bugs which are impossible to statically analyse because you can’t enumerate a priori the unbounded non-determinism in the I/O environment!) and even makes the GC less robust and slower.

Instead I’m aiming for a separate heap for each thread, and copying objects between threads. Or the alternative would be allowing the reference counted heap section (or a RC/MS hybrid such in the BG-RC design) be muiltithreaded, so objects could be shared by exclusive borrowing of references.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346489276
Original Date: Nov 22, 2017, 4:24 PM CST


The results are never as good in reality. See: https://goo.gl/images/vi6pce as you can see BG-RC performed worse than BG-MS, and that in turn was worse than simple mark-sweep. For general use nothing beats the mark-sweep garbage collector, but it still has all the problems we discussed. You can make the worst-cases for GC less bad, but that always makes the common-cases less good. There's no such thing as a free lunch as they say.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346510644
Original Date: Nov 22, 2017, 6:50 PM CST


NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346516777
Original Date: Nov 22, 2017, 7:54 PM CST


@keean wrote:

The results are never as good in reality. See: https://goo.gl/images/vi6pce as you can see BG-RC performed worse than BG-MS, and that in turn was worse than simple mark-sweep.

Why are you rushing to make statements which are not substantiated by metrics?1 That slideshow doesn’t conclude that BG-MS outpreforms BG-RC on throughput nor maximum pause times.

All that it says that is that when heap size declines, BG-RC performance degrades faster than BG-MS. I presume the results are consistent with the graphs in the 2003 paper.

Thinking about it more, probably the relative degradation is due to the fact that the MS part of BG-MS must process the entire old generation for every mark-sweep, thus it processes less as the heap is smaller. Whereas, the computational complexity of the RC part of BG-RC is independent of the size of the heap. If I’m correct in my interpretation, then the relative degradation is simply the effect of the horrible asymptotics of the MS of BG-MS.

But of course there is no way to violate the laws of physics and to keep the maximum pause times very small even when the heap is almost full, will of course incur more throughput overhead for the concurrently running cycle collection.

The improvements in automated GC are about improving the distribution of the amortization of costs to hold maximum pause times down for all scenarios, while designing for efficiency that keeps overall throughput overhead reasonable.

Thus the real slamdunk against your thought process, is as I postulated upthread (although I do not know how to measure it to prove it, because how does one analyse “manual memory mgmt” within the context of maintenance hairball outcomes, manpower costs, costs of human errors during the process of fixing all the bugs, etc) that even “manual memory mgmt” will not be able to do significantly better (on average and in general) on the time vs. space tradeoff:

@shelby3 wrote:

@barrkel wrote:

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

No one can’t invalidate the (valid) laws of physics employing computer science. When the heap is full, it is full.

Of course with infinite available development resources (and presuming the complexity doesn’t get mired in a maintenance hairball), then manual coding of every detail can be perfect, i.e. digging canals with spoons is also plausible. But that is not an argument against using tools (e.g. a backhoe or automated GC), because it lacks any economic (i.e. cost analysis) context. The hyperbolic analogy is why not argue for designing your own customized integrated circuits hardware for each program you create in order to optimize the hardware for each specific use case? IOW, manual perfection requires huge economies-of-scale.

1 You also made such unsubtantiated claims upthread such as the ostentatious claim that GC was responsible for virtually all the slowdown of JS relative to near-native ASM.js/WASM and implying that JS was within 20% of native, although you clarified that maybe what you meant was that GC was encouraging other design habits/choices/abstractions in JS that promote inefficient programming. Religious design preferences and superstition shouldn‘t be employed to degrade our discussions here. Clear metrics should be cited and not misrepresented, which I think was the generative essence of @barrkel’s point. If you’re intending to just make non-objective, anecdotal arguments, perhaps you can make it clear you know you’re doing so.


@NodixBlockchain wrote:

But all the thinking goes for solving the hard cases, even if they are rare, there will always be some in any program, and they are the ones that will cause all the headaches.

But there is nothing perfect, not even manual memory mgmt. At least a reproducible automated algorithm that includes all the wisdom and profiling of a huge investment in perfecting automated GC, is going to get us as close as possible to optimum in general and on average. Of course there will always be corner cases in any programming system.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346547009
Original Date: Nov 23, 2017, 1:33 AM CST


Another perspective, which matches my experience of fighting the garbage collector: http://prog21.dadgum.com/134.html

Also I re-read the paper, and I did get those performance metrics backwards. If it's as good as the paper says, I wonder why nobody is using it.

NodixBlockchain commented 3 years ago

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346594564
Original Date: Nov 23, 2017, 5:32 AM CST


But there is nothing perfect, not even manual memory mgmt. At least a reproducible automated >algorithm that includes all the wisdom and profiling of a huge investment in perfecting automated GC, >is going to get us as close as possible to optimum in general and on average. Of course there will >always be corner cases in any programming system.

It's not to do my linus, but in a way there is no 'gray area', there is no easy case, hard case, and corner case, there are just cases. If the GC can't deal with all the cases properly then it's broken period.

And this article get quite to what i see with most GC language. If the program is small and simple it will work fine, but once you need to handle loads of data, have big list of potentially complex references, it still need some sort of assisted memory management to get things right.

https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector

It's not like you can go ball out allocating millions of object with reference without a care in the world about how to optimize memory in sort that objects that are short lived can easily be detected as such, and that GC will always function in optimal manner without some form of manual optimization.

In a way GC just make the manual optimization that will always be needed somehow more tricky and more complex. But it's not like you can completely forget about memory allocation at all and only rely on the GC for everything without any care for how it will work, and some form of manual management of object lifetime and how they are represented in the GC etc.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346872669
Original Date: Nov 24, 2017, 11:24 AM CST


@keean wrote:

http://prog21.dadgum.com/134.html

Writing a concurrent garbage collector to handle gigabytes is a difficult engineering feat, but any student project GC will tear through a 100K heap fast enough to be worthy of a "soft real-time" label. While it should be obvious that keeping data sizes down is the first step in reducing garbage collection issues

I don’t understand how a generational garbage collector is going to have a problem with the volume of data allocations in young generations, unless it is exceeds what any form of memory mgmt could handle with non-deterministic allocation order.

That guy is blowing hot air out of his ass (or at least not making clear that he’s comparing apples-to-oranges).

There are going to be cases where one will have to try to devise a deterministic allocation scheme in order to get the performance they want. When deterministic is feasible, then yes higher performance than a non-deterministic algorithm is possible. But compare apples-to-apples. Don’t compare deterministic to non-deterministic allocation, because they’re not the same code base, algorithm, and problem set.

The reason I propose to not provide facilities in the compiler for static linear-typing and region analysis (a la Rust), is because (so far) I think the use cases are exceptional and in that case the programmer should devise some very specific/customized deterministic allocation solution for his needs. And because for a programming language design we have to stay within a complexity budget. However, I am open to changing to my mind and see the posts that follow this one.


@NodixBlockchain wrote:

https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector

How many times am I going to have to cite that the BG-RC of the 2003 research paper eliminates the long collection pauses because it doesn’t have to mark-sweep (MS) the entire old generation. You’re citing a BG-MS type collector which is inferior. (That you do not even realize it, is yet another in an endless series of examples of you not paying attention and being clueless about the topics of discussion).

I asked you politely numerous times to please STFU with your useless noise IN MY THREAD. You’re being disrespectful to my thread. I asked you to wait at least 6 long posts by others between your posts. And 10 if any of the others are short.

If the GC can't deal with all the cases properly then it's broken period.

Nothing, not even manual mgmt with finite programmer man-hour (and hairball rigor mortis maintenance limits) resources, can deal with all possible cases. You’re talking hair-brained BS as usual. I will not be respectful to you, when you are writing nonsense and also disrespecting my reasonable request.

The entire point is the GC can do about as well (on average) as manual memory mgmt can do with the same (or less) programmer and computer resources.

Even “manual memory mgmt” will have corner cases and break. No one has infinite programmer resources. Broken means sometimes remains broken because there are not enough resources and/or the maintenance hairball has been too unwieldy.

If you want to expend all that extra effort on “manual memory mgmt” for an abstruse case, then go ahead. No one is stopping you. In any case, please stop shitting on my thread. I hope this is the last time I have to ask you.

It's not like you can go ball out allocating millions of object with reference without a care in the world about how to optimize memory in sort that objects that are short lived can easily be detected as such, and that GC will always function in optimal manner without some form of manual optimization.

Whether you use a GC or “manual memory mgmt”, you still need to think about tuning your code.

But the point is there is no such thing as “manual memory mgmt”. You end up writing a customized automated collector, because manual memory management can’t exist given non-determinism of memory allocation, because the programmer can’t actually click a button on the computer every time a program is running when memory needs to be deallocated.

What you mean by “manual memory mgmt” is something like reference counting and then attempting to avoid cyclical references by carefully designing your program to use weak pointers. Yet that is an automatic GC scheme (albeit a very inefficient one that can’t collect cyclical references) and you rely on human-error to litter the code with memory leaks via unavoidable instances of cyclical references (in the general case). If you prefer that mgmt paradigm, then feel free to use it. But I do not.

Yet I am repeating what I had already written in my prior posts in this thread, yet you do not assimilate the information, thus I am forced to repeat myself over and over again.

PLEASE LEAVE MY THREAD. This is the 3rd time I’ve asked you. I thought you were going to respect my request, but you just can’t contain your exuberance. And I do not care what your opinion is of me, so do not waste your time telling me. JUST LEAVE PLEASE.

You (probably unintentionally) wasted an enormous amount of my time on BCT (albeit mea culpa that I baited the trolls over there and you weren’t intentionally trolling me there nor here) and continuing to do so here. There may be nuggets of interaction with you which have been fruitful, but the signal-to-noise ratio is very high. I told you that I’m willing to endure that high S/N ratio, but not in this thread— which I intend to be concise.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346916421
Original Date: Nov 24, 2017, 9:40 PM CST


I wrote:

I don’t understand how a generational garbage collector is going to have a problem with the volume of data allocations in young generations, unless it is exceeds what any form of memory mgmt could handle with non-deterministic allocation order.

[…]

There are going to be cases where one will have to try to devise a deterministic allocation scheme in order to get the performance they want. When deterministic is feasible, then yes higher performance than a non-deterministic algorithm is possible. But compare apples-to-apples. Don’t compare deterministic to non-deterministic allocation, because they’re not the same code base, algorithm, and problem set.

The reason I propose to not provide facilities in the compiler for static linear-typing and region analysis (a la Rust), is because I think the use cases are exceptional and in that case the programmer should devise some very specific/customized deterministic allocation solution for his needs. And because for a programming language design we have to stay within a complexity budget.

Examples of deterministic allocation that would unnecessarily tax the automated non-deterministic GC, are assignment of a new instance either exclusively to a block scoped pointer (aka pointer stored locally on stack) in a loop, or as in @keean’s image buffering example allocating a new 16MB buffer each time his code needs one:

p = new Object

So in the first case, the young generation allocator is slammed with a new allocation for each iteration of the loop, and in the latter case the old generation allocator accumulates 16 MB allocations which have to be collected (and with a MS collector that is going to be very costly).

Both of those can be structured as deterministic allocation strategies and made much more efficient. In the first case, the static analysis of the compiler could potentially determine that the newly allocated object is only ever referenced (beyond a single iteration of the loop) by the one pointer it is assigned to and thus the allocated space for the object can be reused (i.e. the newly allocated object can replace the prior allocated one in the same memory location). For the latter case, assuming the 2003-invented BG-RC allocator is employed (which apparently no languages yet employ), then the programmer can reuse the same pointer always for the buffer so the compiler can check that the reference count on the prior allocated 16 MB object/buffer is going to zero when the newly allocated object/buffer will be assigned, and thus the compiler+allocator can allocate the new object/buffer replacing the object/buffer that is being freed in the same memory location. Afaik, @keean wasn’t recycling his 16 MB buffers instead of reallocating them because of some flaw in the standard APIs of the WWW. Inferior GC collectors (e.g. BG-MS) and inferior APIs on top of them, combine to contribute the malignment of automated GC, but afaics critics aren’t conceptualizing this issue properly. I keep repeating that the demarcation in improvement over automated GC, always lies in what can be deterministically deallocated.

The problem with deterministic deallocation is either it requires trusting the human to analyse correctly which are deterministic (and there can be human errors that arise) or afaik so far the state-of-the-art is the tsuris of adding the necessary annotations and compiler enforcement of certain invariants in order to achieve some provably, compiler-checked deterministic cases. And the problem is the compiler will not be able to prove all possible deterministic cases, regardless of how much complexity and tsuris we add to the programming language (a la Rust).

I would like to investigate why others have concluded that the compiler can’t do the static analysis of deterministic optimization opportunities w.r.t. to memory allocation, without assistance from the programmer. I remember @keean had mentioned to me in the past discussion in the Concurrency thread that in general for the compiler perform all optimization is a NP hard search problem and he was also concerned about the compiler optimizations changing between compiler versions introducing new bugs into code. However, we may have structure here, that reduces the search for many deterministic optimizations to a suitable computational class. Remove the tsuris entirely. I believe humans build tools to make humans more efficient, e.g. backhoes instead of spoons for digging. The automated GC algorithms are runtime tools for non-deterministic resource allocation mgmt. We also need automated algorithms for the static-time analysis. Compilers already do automated static-time analysis for many very localized optimizations such as register allocation.

Let’s actually improve the programming language state-of-the-art and do something important!

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346948187
Original Date: Nov 25, 2017, 9:43 AM CST


So Rust has lifetimes which apply both to data races mutable memory safety (which also applies to multithreaded data races) and escape analysis optimization which prevents use-after-free while enabling deterministic resource (not just memory) deallocation (thus proving safety for stack-based memory allocation). And non-deterministic resource deallocation in Rust is handled by explicit reference counting, which is an explicit pointer type, thus complicating the type system with 3 different types of pointers (complexity that @keean and I agreed is not desirable).

All the non-deterministic younger generation deallocation is inefficiently pushed into reference counting explicitly. So an improvement might be to combine Rust’s lifetimes with that aforementioned year 2003 invented BG-RC automated garbage collector, possibly to gain the efficiency of fast generational collection of the young objects which escape deterministic lifetimes (if any significant quantity still do) and potentially eliminate the need to statically declare the type of each pointer. The compiler could I presume detect from lifetime analysis which pointers can be deterministically allocated on the stack. Pointers that escape lifetime analysis thus could be delegated to the generational GC which for older generation objects employs efficient RC + cyclical references deallocation with low pause times.

My prior post in this thread gave an example of the sort of deterministic optimization I hoped the compiler could do, but I realize such optimizations require escape analysis and the compiler would still need to require annotation of lifetime parameters because otherwise:

Tangentially, how can the Rust lifetime parameter be declared for a function input which is also an output where the input state has a different lifetime than the output state? I presume this is not currently supported by Rust. Because afaics, non-reference counted references in Rust are always immutable (only the object they point to can be mutated). So I presume Rust forces the use of reference counted pointer for mutable pointers.

P.S. all cited references which are only links, are archived

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346959635
Original Date: Nov 25, 2017, 1:04 PM CST


@keean wrote:

Ada has statically safe pointers (called access types) without all the complex lifetimes of Rust. This is the kind of model I was thinking about.

So access typed pointers are presumably deallocated when the block scope in which the type was defined is exited (because external scopes can’t access pointers of that type).

I wonder if that is less flexible and less optimized than generalized escape analysis. The contained block scopes are lumped together instead of fine grained escape analysis of tracing lifetimes input and output from the scopes. The genre of redundant allocations I alluded to may not be optimized in as many cases. Note it wouldn’t be any worse in an example where Rust’s lifetime analysis fails to be optimum.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346993895
Original Date: Nov 26, 2017, 3:01 AM CST


Upthread while studying GC, I essentially independently thought of the design for BG-RC before discovering the year 2003 prior art for it. I had thought of some potential improvements to the prior art. The two ideas of mine which have survived thus far, are both pertaining to a generational collector:

  • The option of promoting younger generation objects to the mature space instantly when the write-bound detects their assignment to a mature object’s pointer, instead of deferring them to a young generation copy-collection phase. This I suppose eliminates the overhead for tracking which objects to promote at the copy-collection phase and I would think simplifies integration with (when promotion is to) a RC collected mature space. Perhaps the reason this isn’t done is to retain economies-of-scale and locality optimizations. EDIT: duh, as I realized downthread, this isn't possible because not all of the pointers to the same object can be updated instantly.

  • Generational copy-collectors have a trade-off between duration of period between promotion to mature space (which impacts how many objects leak to mature collection when they could have been collected more efficiently with a period longer than their allocation lifespan), the frequency of copy-collection of the young generation (for space and locality efficiency), and the amount of duplicate copying due to copy-collecting numerous times before promotion in order to increase said frequency.

    Instead of copy-collecting the entire younger generation monolithically and tagging the object with the number of times it has been copy-collected (with promotion occurring when said count reaches a threshold), I propose to divide the young generation into separate copy-collected regions, one for each copy-collected count required for said threshold. Then perform each copy-collection only twice on each region: once for its first count and again on the last count before promotion. After each copy-collection phase promote the surviving objects in oldest region and bump the count of each region. Afaics, this eliminates most of the duplicate copying while maintaining the high frequency for the youngest objects (those that die within the lifespan period of one count).

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-347577770
Original Date: Nov 28, 2017, 10:21 AM CST


@keean wrote:

"All objects die young" directly leads to heap fragmentation especially if you also have "everything is an object".

We can make things better by having value semantics, and storing values directly on the stack (providing they are under a certain size to prevent stack overflows), and by having object pools that re-use objects, and mutating object in-place.

As I have developed my understanding in this thread, I have come to the conclusion that the language I want needs:

  • Static analysis of lifetimes with lifetime parameters annotation, but without the unnecessary tsuris of Rust’s exclusivity of mutability borrowing. Lifetimes which do not escape static/compile-time analysis, can be managed deterministically, with small enough objects on the stack and larger ones on the heap. This probably eliminates the need for generational GC. Ada’s access types are much less powerful.

  • The lifetime analysis will aid an optimizing compiler in automatically mutating object in-place, because the compiler can detect which lifetimes don’t overlap and can reuse space. There would need to be some other manual facility for reusing object space for objects that escape compile-time lifetime analysis.

    Lifetime analysis can be applied to non-memory resources also a la Rust.

    Escape analysis may also be apply to my controversial idea to have the compiler automatically detect implicit concurrency dependencies so programmers doesn’t have to always explicitly structure concurrent code around Promises.

    In addition to zero-cost overhead performance, these are some of the reasons I have favored the slight tsuris of lifetime annotations over the ease of a generational GC.

  • Reference counting (with the Bacon test deletion method of avoiding cyclical reference memory leaks) could be employed for all objects which escape static lifetime analysis.

    Future work: not sure if we’d need some (generational and/or) mark-sweep heap for objects that would otherwise have their reference counts modified too frequently, and whether the runtime could automatically and dynamically manage which objects to put in the (BG-)MS heap or whether the programmer would need to manually indicate which objects need this optimization.

  • Linked-lists of free space memory allocators fragment the memory space too much and exhibit poor locality for cache and memory paging faults performance. Allocating in blocks which are multiples of the virtual memory page size will enable less fragmentation because the virtual fragmentation can be mapped to contiguous physical memory, especially viable with a 64-bit address space so that virtual address space isn’t depleted by fragmentation. The optimum solution appears to be for small objects a bump-pointer style monotonic allocation within multiple of page size blocks wherein none of the freed space in the block is reused and the entire block is not freed until every object in the block is freed (c.f. “§2.1.3.5 Mark Region”). That enables fast allocation by bumping a pointer and fast deallocation by decrementing a counter. Fragmentation induced by delayed freeing of blocks is in virtual memory space thus mostly irrelevant given a 64-bit address space. The MMTk segregated free space allocator seems to employ some variant of this strategy.


I had been averse to the tsuris and noise of Rust’s lifetime parameter annotations. But perhaps this can be mitigated significantly not by having a different focus for rules on eliding the annotations, and only a less obtrusive syntax for the annotation.

Specifically I’m contemplating Unicode superscripts for annotations that apply to the lifetime of the type and subscripts for annotations that apply to the fields of the type (if any) with indicating to skip/elide annotation of a field. I’m intending to fully embrace Unicode characters in the syntax of Lucid. It’s time to move forward and stop making excuses about keyboards. Any decent programmer should have $100 keyboard which is programmable with custom keys and keystrokes. And IDE’s should provide such facilities to easily type the necessary Unicode characters.

So compared to following Rust example:

fn f<'a, 'b, 'c>(x: &'a Foo<'c>, y: &'b Foo<'c>) -> &'a Foo<'c>

Lucid could have something more elegant such as:

f(x, y)                                 : (Foo¹₃, Foo²₃) → Foo¹₃

In the above Lucid example, I elided the &. Rust requires the programmer explicitly declare which references are (passed as) pointers and which are (passed by) value. Ditto in data structures explicitly declare whether the name represents a pointer to the type or the value of the type in-place (not quite exactly the same concept as unboxed).

I contemplate that I prefer that the default is passed-by-reference for function arguments and in-place for fields of data structures, except for primitive types would always be passed-by-value. Thus function calls would not be littered with superfluous & annotations on arguments as is the case for Rust. For declaring pass-by-value for functions the annotation would be * and the annotation for pointer in data structures would be &.


Everything about the above design could be transpiled to native JavaScript and its GC (albeit run slower than optimum). I think even pointers into data structures and arrays can be supported, so that a C-like syntax can be transpiled. I’m not even getting into typeclasses above as that is not my highest near-term priority. Can be added later.

Thus the lifetime annotations and analysis features could be delayed if the goal is to get rolling asap with a syntax and JavaScript transpiler for Lucid, so that code which would otherwise be written in for example TypeScript and C/Emscripten would not have to be rewritten later in Lucid. More optimum compiled targets could be developed such as WebAssembly which leverage the full power of the design laid out above, including more optimum compilation of the C-like code. One of my test use cases would be to translate my Blake C code to Lucid.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-349153820
Original Date: Dec 4, 2017, 6:27 PM CST


Today I had added an edit to my significant motivations for creating a new programming language (Lucid) to reflect my direction in this thread. Then someone reminded me to go read ESR’s recent blogs. That recent edit at the aforementioned linked comment corresponds nearly precisely with some recent blogs by Eric S. Raymond:

ESR wrote:

My last post (The long goodbye to C) elicited a comment from a C++ expert I was friends with long ago, recommending C++ as the language to replace C. Which ain’t gonna happen; if that were a viable future, Go and Rust would never have been conceived.

[…]

My problem with the [C++] language, starkly revealed by that adventure, is that it piles complexity on complexity upon chrome upon gingerbread in an attempt to address problems that cannot actually be solved because the foundational abstractions are leaky. It’s all very well to say “well, don’t do that” about things like bare pointers, and for small-scale single-developer projects (like my eqn upgrade) it is realistic to expect the discipline can be enforced.

Not so on projects with larger scale or multiple devs at varying skill levels (the case I normally deal with). With probability asymptotically approaching one over time and increasing LOC, someone is inadvertently going to poke through one of the leaks. At which point you have a bug which, because of over-layers of gnarly complexity such as STL, is much more difficult to characterize and fix than the equivalent defect in C.

[…]

There were two directions a successor language to C might have gone. One was to do what C++ did – accept C’s leaky abstractions, bare pointers and all, for backward compatibility, than try to build a state-of-the-art language on top of them. The other would have been to attack C’s problems at their root – fix the leaky abstractions. That would break backward compatibility, but it would foreclose the class of problems that dominate C/C++ defects.

The first serious attempt at the second path was Java in 1995. It wasn’t a bad try, but the choice to build it over a j-code interpreter mode it unsuitable for systems programming. That left a huge hole in the options for systems programming that wouldn’t be properly addressed for another 15 years, until Rust and Go.

[…]

This is in many ways a bad situation. It was hard to really see this because of the lack of viable alternatives, but C/C++ has not scaled well. Most of us take for granted the escalating rate of defects and security compromises in infrastructure software without really thinking about how much of that is due to really fundamental language problems like buffer-overrun vulnerabilities.

So, why did it take so long to address that? It was 37 years from C (1972) to Go (2009)

[…]

Over time, there’s been a gradual shift from languages that require manual memory management to languages with automatic memory management and garbage collection (GC). This shift corresponds to the Moore’s Law effect of decreasing hardware costs making programmer time relatively more expensive. But there are at least two other relevant dimensions.

One is distance from the bare metal. Inefficiency low in the software stack (kernels and service code) ripples multiplicatively up the stack. This, we see machine-centric languages down low and programmer-centric languages higher up, most often in user-facing software that only has to respond at human speed (time scale 0.1 sec).

Another is project scale. Every language also has an expected rate of induced defects per thousand lines of code due to programmers tripping over leaks and flaws in its abstractions. This rate runs higher in machine-centric languages, much lower in programmer-centric ones with GC. As project scale goes up, therefore, languages with GC become more and more important as a strategy against unacceptable defect rates.

When we view language deployments along these three dimensions, the observed pattern today – C down below, an increasing gallimaufry of languages with GC above – almost makes sense. Almost. But there is something else going on. C is stickier than it ought to be, and used way further up the stack than actually makes sense.

[…]

Not so the new wave of contending systems-programming languages, though. Rust and Go are both explicitly responses to increasing project scale. Where scripting languages got started as an effective way to write small programs and gradually scaled up, Rust and Go were positioned from the start as ways to reduce defect rates in really large projects. Like, Google’s search service and Facebook’s real-time-chat multiplexer.

I think this is the answer to the “why not sooner” question. Rust and Go aren’t actually late at all, they’re relatively prompt responses to a cost driver that was underweighted until recently.

OK, so much for theory. What predictions does this one generate? What does it tell us about what comes after C?

Here’s the big one. The largest trend driving development towards GC languages haven’t reversed, and there’s no reason to expect it will. Therefore: eventually we will have GC techniques with low enough latency overhead to be usable in kernels and low-level firmware, and those will ship in language implementations. Those are the languages that will truly end C’s long reign.

There are broad hints in the working papers from the Go development group that they’re headed in this direction – references to academic work on concurrent garbage collectors that never have stop-the-world pauses. If Go itself doesn’t pick up this option, other language designers will. But I think they will – the business case for Google to push them there is obvious (can you say “Android development”?).

Well before we get to GC that good, I’m putting my bet on Go to replace C anywhere that the GC it has now is affordable – which means not just applications but most systems work outside of kernels and embedded. The reason is simple: there is no path out of C’s defect rates with lower transition costs.

ESR is conflating and/or oversimplifying a bit here though. Buffer overruns and some other forms of programmer errors aren’t related to GC (aka memory allocation mgmt), but rather to static or runtime checking.

He’s really intending to drive at the general concept of the sweet spot in balancing programmer productivity, lower defect rates, easy adoption, and adequate performance for the use case (including large projects and some systems programming), as he explained further in his follow-up blog:

ESR wrote:

An increase in praxeological understanding of technology can reframe it, leading to tremendous increases in human productivity and satisfaction, not so much because of changes in our tools but because of changes in the way we grasp them.

In this, the third of my unplanned series of posts about the twilight of C and the huge changes coming as we actually begin to see forward into a new era of systems programming, I’m going to try to cash that general insight out into some more specific and generative ideas about the design of computer languages, why they succeed, and why they fail.

In my last post I noted that every computer language is an embodiment of a relative-value claim, an assertion about the optimal tradeoff between spending machine resources and spending programmer time, all of this in a context where the cost of computing power steadily falls over time while programmer-time costs remain relatively stable or may even rise. I also highlighted the additional role of transition costs in pinning old tradeoff assertions into place. I described what language designers do as seeking a new optimum for present and near-future conditions.

Now I’m going to focus on that last concept. A language designer has lots of possible moves in language-design space from where the state of the art is now. What kind of type system? GC or manual allocation? What mix of imperative, functional, or OO approaches? But in praxeological terms his choice is, I think, usually much simpler: attack a near problem or a far problem?

“Near” and “far” are measured along the curves of falling hardware costs, rising software complexity, and increasing transition costs from existing languages. A near problem is one the designer can see right in front of him; a far problem is a set of conditions that can be seen coming but won’t necessarily arrive for some time. A near solution can be deployed immediately, to great practical effect, but may age badly as conditions change. A far solution is a bold bet that may smother under the weight of its own overhead before its future arrives, or never be adopted at all because moving to it is too expensive.

[…]

The failure modes of near designs are uglier. The best outcome to hope for is a graceful death and transition to a newer design. If they hang on (most likely to happen when transition costs out are high) features often get piled on them to keep them relevant, increasing complexity until they become teetering piles of cruft. Yes, C++, I’m looking at you. You too, Javascript.

[…]

That said, there is a grand strategy expressed in the Go design that I think is right. To understand it, we need to review what the near problem for a C replacement is. As I noted in the prequels, it is rising defect rates as systems projects scale up – and specifically memory-management bugs because that category so dominates crash bugs and security exploits.

[…]

Of course, the proboscid in the room when I say that is Rust. Which is, in fact, positioning itself as the better far bet. I’ve explained in previous installments why I don’t think it’s really ready to compete yet.

[…]

Where Rust will be in five years is a different question, of course. My advice to the Rust community, if they care, is to pay some serious attention to the transition-cost problem. My personal experience says the C to Rust energy barrier is nasty. Code-lifting tools like Corrode won’t solve it if all they do is map C to unsafe Rust, and if there were an easy way to automate ownership/lifetime annotations they wouldn’t be needed at all – the compiler would just do that for you. I don’t know what a solution would look like, here, but I think they better find one.

Well in this thread we discovered that Go’s design strategy may be sub-optimal, and perhaps some static checking similar to Rust’s lifetimes could be unbundled — from 99% of the tsuris which is Rust’s total order on mutable borrowing — providing more optimal static memory allocation at a very low tsuris, enabling automatic reference counting (RC) with automatic circular reference probing to provide a more performant result with less tsuris. The mark-sweep (concurrent or otherwise) GC suffers from requiring a tradeoff between low pause times, performance, asymptotic degradation as the total memory allocated increases, and accelerated degradation as the amount of free physical memory is reduced. Runtime GC is never as performant as static GC.

One key insight of mine (and others) is that the total order aspect of exclusive mutable borrowing is probably an Achilles heel of Rust. And the requirement for exclusive mutability throughout all the code (or otherwise also losing lifetime tracking by littering the code with unsafe annotations) is not the only paradigm for preventing logic errors due to shared mutability especially in synchronous single-threaded code (i.e. separated threads/channels; which btw ESR pointed out is Go’s strategy). Although asynchronous callbacks (Promise) in single-threaded code creates effectively out-of-ordered access, they’re not preemptive race conditions requiring mutexes and at least these are demarcated (because they’re not preemptive). Whereas, Rust’s total order on mutability presumes (even preemptive multitasking via) multi-threading races everywhere, anytime. Exclusive mutability isn’t even always the desired or even plausible algorithm for multithreading (nor out-of-order access) such as when mutating a shared resource; although it does provable avoid runtime dead and live locks (although as @keean and I had discussed somewhere: locking errors due to program interaction with unbounded instances of external locking can’t be statically proven to not exist!). Instead minimizing the shared mutables between threads (or between out-of-ordered access) is typically the desired design strategy.1

Go also lacks typeclasses, afaik can’t even guard against null pointer usage, and other significant faults.

1 Note out-of-ordered access (not necessarily multithreading data races) may employ logic, data structures, and algorithms which are sound with any order of access (which in addition to others such as queues, may include immutable data structures or exclusive mutable borrowing). Otherwise, the problem though is that any code which waits after calling an asynchronous function (e.g. that returns a Promise) is out-of-order after that wait w.r.t. to other code which preempted itself by waiting for callback, i.e. the out-of-ordering of callbacks “infects” all parents in the function call hierarchy. Thus, it might in some cases be better to have synchronous single-threaded zones which prevent out-of-order access because they’re never waiting on more than one callback in the zone (instead a different thread zone is running while waiting on a callback); and said zones mustn’t share (with each other) references to mutables. Such synchronous single-threaded zones for shared mutable access are probably most applicable to replicated tasks that run round-robin in parallel on a thread pool (perhaps with some limited asynchronous capability to launch multiple parallel, isolated sub-tasks that don’t share mutables while stalling the synchronous parent zone entirely until all the asynchronous are completed). Obviously UI code couldn’t be placed in such a synchronous zone and would instead need to employ the aforementioned logic, data structures, and algorithms which are sound with any order of access.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-349427473
Original Date: Dec 5, 2017, 2:16 PM CST


@shelby3 wrote:

And the requirement for exclusive mutability throughout all the code (or otherwise also losing lifetime tracking by littering the code with unsafe annotations) is not the only paradigm for preventing logic errors due to shared mutability especially in synchronous single-threaded code […] Exclusive mutability isn’t even always the desired or even plausible algorithm for multithreading (nor out-of-order access) such as when mutating a shared resource, although […] Instead minimizing the shared mutables between threads (or between out-of-ordered access) is typically the desired design strategy.1

Rust’s total order exclusive mutability borrowing restriction (although enabling pointer aliasing optimizations) can unnecessarily prevent localized reasoning and thus afaics is not the one-size-fits-all solution that perhaps some naive Rust proponents may think it is. The cost of the pointer aliasing optimization can be quite onerous. The new lifetime checker in C++ also admits such “Known Issues”.

For example, in the single-threaded case of shared writable access, even in the asynchronous case where code allows order-of-order preemption by waiting on more than one callback, there are compile-time deterministic cases where there isn’t a critical section as would be the case for multithreading (multithreading preempts at any non-deterministic instant). An example of such a case is reading the value, computing a new value, and writing the result back before providing an opportunity for another callback to run in the same thread. If for example single-threaded code has a critical section on an externally shared mutable, it could opt to insure it doesn’t provide an opportunity for another callback to run within that critical section. Thus the soundness can be in some cases subject only to local reasoning and doesn’t always require the tsuris+inflexibility of a total (aka potentially the entire program or unbounded universe) order on mutable borrowing.

Rust naive proponents may imply that exclusive mutable borrowing is even necessary in single-threaded code to prevent data races such as use-after-free, which occur when the code obfuscates that it erroneously modifies some invariant which the programmer or static lifetime checker assumed, such as iterator invalidation or invalidation of the presumed lifetime.

In the latter linked example, Rust’s lifetime analysis presumes the exclusivity of mutable borrows (i.e. the soundness of either requires the other being enforced), yet instead a compiler could disallow returning a lifetime tracked reference that is itself (i.e. not what it references) mutable, so the programmer would need to either make said reference immutable, or have fn get_text return an immutable doubly-indirected pointer (i.e. an immutable reference to the mutable reference)††, or replace fn get_text with a function that does some operation on the referenced text. Alternatively to allow the said reference to be mutable yet still avert a total order on exclusive mutable borrowing, for the synchronous single-threaded case that protects the critical section (i.e. no out-of-order callbacks can run within it and no shared multithreaded access), the exclusive mutability borrowing could be compiler enforced but would only be necessary to be enforced within the said critical section which in that example is the lifetime of the my_text reference (and not the entire lifetime of the editor).

Which are thus provably sound because the invariants are enforced by the compiler, unlike C’s restrict annotation which is reliant on the programmer’s judgement.

Specifically which is mutable during the lifetime (i.e. scope) of the instance of the object (e.g. editor in this case) with which it shares its lifetime.

†† The issue of whether a null pointer could be assigned to the said mutable reference is an orthogonal issue. Presumably the language has an Option or Nullable type, which guards against attempting to read or write from the null memory location.

For the former linked example, iterator invalidation is sometimes not a problem. For example, most iterators need to check bounds on each previous or next action, so even removing items from the collection being iterated wouldn’t cause an out-of-bounds access, except in the case that the iterator accessor is invalidated analogously to the aforementioned fn get_text example for a lifetime analyser (i.e. elements aren’t RC objects) or because the reference to an element stored-in-place was returned by the said accessor. So for the aforementioned problematic cases, the solutions (which avoid Rust’s total ordering on exclusive mutable borrowing) would be the same as in the aforementioned paradigm. Note iterator invalidation can also be a semantic error w.r.t. some algorithmic assumptions of invariance, yet that’s sort of irrelevant because in general, lifetime and borrow checking will not prevent all classes of semantic errors.

Another example where Rust prevents localized reasoning is where the lifetime analysis is unable to infer invariants that are obvious to the programmer. Note even if Rust were to deduce such invariants, they would still need to be declared as annotations; otherwise, changing code would require recompiling the universe of all the callers of the method buy_car because the semantics of the lifetime analysis could have changed (whereas annotating forces the programmer to change the annotation if he really wants that invariant to change, which btw is the main reason lifetime parameter annotations must normally be explicit):

@shelby3 wrote:


@shelby3 wrote:

If you are coming from languages like Java or Objective-C you might be wondering why do you have to deal with lifetime in Rust. There are very good reasons.

Readers should note that “use after free" errors are not the only reason lifetimes are required. They’re also required so that the borrow checker can assure exclusive writable access so that multithreaded preemption of critical sections doesn’t have to be guarded with locks. Locks create dead and live lock errors at runtime. And contention around locking reduces performance. See Sean Parent’s Concurrency video on Youtube about the cost of locks and that they don’t scale.

However, I think the total order on borrowing for Rust was a design mistake. I’m working on an alternative (Lucid) which I think is more flexible and less tsuris.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350958453
Original Date: Dec 12, 2017, 12:33 AM CST


@shelby3 wrote:

[…] such as iterator invalidation or invalidation of the presumed lifetime.

Another example where Rust prevents localized reasoning is where the lifetime analysis is unable to infer invariants that are obvious to the programmer.

The above cases in addition to demonstrating that Rust can’t (and not elegantly) model all possible optimized algorithms statically (no static typing ever can!), also exhibit that Rust's lifetimes and exclusive mutable references are interdependent, because Rust has to insure that the all the heap allocations (e.g. String) are statically tracked and deallocated at the correct stack frame expiration. References to stack allocated objects don’t need to remember to deallocate the objects assigned to them (because stack allocated objects are deallocated in bulk by the stack frame expiration) and thus only lifetimes need to be enforced. Whereas, where assigning a different heap allocated object to a reference that already references a heap allocated object, the former object must be deallocated at that instant, because the eventual stack frame destructor of the said reference can only remember to deallocate the last object it’s referencing:

If you are coming from languages like Java or Objective-C you might be wondering why do you have to deal with lifetime in Rust. There are very good reasons. In Java the use after free problem is solved through the use of Garbage Collection. GC will not destroy the String as long as any variable points to it. Objective-C will employ reference counting to solve this problem. In our example there will be two references – one by the text member variable and the other by my_txt. The String will be destroyed only when the reference count goes to 0. In other words when both text and my_txt stops existing.

GC requires a heavy runtime and may incur awkward pauses. Reference counting has extra overhead and simply not reliable in all cases. Rust’s lifetime provides an elegant solution with zero runtime overhead.

Thus exclusive mutable references are necessary (combined with lifetimes) for static tracking of the allocation of heap allocated objects. Yet for the static tracking of stack allocated objects, then only lifetimes are required.

Rust presumably considers the extra burden/inflexibility/tsuris/non-optimality of maintaining a total order on exclusive mutable references to be a freebie because Rust presumes that constraint is required any way because instantaneous preemption is assumed everywhere (i.e. multithreading everywhere). However, I’ve proposed we consider single-threaded models and thread pools.

It occurs to me that the generational portion of an automated GC is essentially as runtime performant (or perhaps even more so due to bump pointer allocation of the generational allocator) as Rust’s static heap allocation lifetimes, excepting the need for a BG-MS or BG-RC garbage collector to have a write-barrier to detect when assigning to a older generation (for MS) or RC (reference counted) object. Rust’s statically deterministic deallocation frees the space of the objects asap (i.e. sooner than the periodic generational collector) but not necessarily sooner to the usable memory pool due to fragmentation issues; and if there’s no destructors on those objects then the additional immediacy is irrelevant. Remember most of the generational objects die before the collection cycle so they have no overhead when freed, less fragmentation, and lower overhead to allocate.

So it occurs to me that with a single-threaded and thread pool model wherein we don’t always need exclusive mutability everywhere, the model that may be both more runtime performant and more efficient, elegant, less hassle, and give the programmer more control, would be to have the programmer statically declare which references will live a long time and these will be statically known to be RC. Thus, the performance reducing runtime write-barrier would no longer be required. Other heap allocated objects stay in the generational GC until all non-RC references to each of them disappear or until the object is assigned to an RC reference (which promotes the object to the RC collector). I think programmers know quite well which objects will live a long time or not, even if they don’t know exactly how long. Also RC references could be employed for any heap allocated objects that have a destructor, e.g. file handle objects that have to close the file. Remember there are algorithms that exist now for solving the circular referencing leaks that used to plague RC. Also the BG-RC collector showed that very low pause times were possible and advantages compared to BG-MS (and even concurrent collectors). Essentially I’m proposing BG-RC but with a static (instead of runtime) promotion from younger generational to the older RC generation, i.e. I propose to raise the performance to compete with Rust while discarding the tsuris and algorithmic inflexibility of Rust.

So this means that non-RC references might be pointing to an object which has been promoted to RC. Such objects can’t actually be deleted by RC decrements until all non-RC references to the object are no longer reachable, i.e. promoted objects have to be flagged, and the flag not remove until the tracing of the generational collector no longer has any reference to the object.

I will want to write some more thoughts I have on this such as other optimizations for low-level code in conjunction with this design, optimizing functional programming/closures, and more thoughts on how to deal with shared mutability and asynchronicity. Perhaps this combination of design choices I have proposed, may be superior to both Go and Rust for a mainstream programming language.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350962848
Original Date: Dec 12, 2017, 1:01 AM CST


I think it's interesting, but I feel you have underestimated the importance of memory space (rather than time). With manual memory management most time is spent on reliability and performance, with GC I spend as much time on reducing memory usage and performance. Most GC does not seem to reclaim on failed allocation as far as I can tell. Software should be robust, and to me there is only one really robust way to do things - make sure you have the memory available before you start an operation, and reserve the space necessary to suspend the computation before starting. A reasonable compromise might be to do a GC on failed allocation, and a de-fragment on double fail, and suspend the thread on triple fail. Reference counting on the stack might be worth it for robustness (maybe that's why Apple likes it).

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350967494
Original Date: Dec 12, 2017, 1:27 AM CST


@keean wrote:

but I feel you have underestimated the importance of memory space (rather than time)

Virtual memory is practically cost-free on a 64-bit address space (but intra-page fragmentation and even harddisk space are not). I mentioned upthread that I’m not prioritizing for 32-bit address space. Some trade-offs have to be made in design and I think that’s an acceptable one, given there will still be for example Rust for specialty and archaic use cases.

Additionally, I’m contemplating that it's not entirely meaningful for us to compare experience with JavaScript’s concurrent BG-MS collector to my proposal for programmer dictated demarcation between a RC collector and an automated BG generational collector that doesn't promote to RC based on age. You haven’t even tried an integrated BG-RC collector (which mine proposes to improve on) because apparently even BG-RC has not yet been adopted by any language.

Most GC does not seem to reclaim on failed allocation as far as I can tell.

Please share an example? I would like to analyse this.

Software should be robust, and to me there is only one really robust way to do things - make sure you have the memory available before you start an operation

In the general case, this is an impossible requirement. You’re expecting the universe to be bounded. We can design some API to return an error if the memory is not available, but then the caller may fail. If the caller (which may be an external service not knowable at compile-time) has to poll everything it needs downstream before it starts, then it needs a crystal ball to know the future of all its future input. Also it presumes the world can be paused in portions, so that computations of available memory for a future task aren't invalidated before the said task completes.

You’re wanting to enumerate the universe (all of the state machine states) of possible inputs, i.e. you must be designing for Six Sigma quality for a use case analogous to the Space Shuttle. Most programming can’t be done to that level of precision realistically nor is it even desirable in most programming to waste such effort on it. Programs typically having acceptable reliability and performance by having sufficient extra memory to deal with unexpected spikes, preferably physical memory so that virtual memory page swapping to disk doesn’t cause errors due to unexpected timeouts.

Perhaps you need to explain your perspective more to me, perhaps I’m not grokking completely your concern?

A reasonable compromise might be to do a GC on failed allocation, and a de-fragment on double fail, and suspend the thread on triple fail.

There will be no failed allocations in 64-bit virtual memory space except in pathological cases.1 Just a slow down as physical memory is filled up, as it should be.

Also I have already mentioned in this thread ways to significantly improve upon the fragmentation issue. I think perhaps you are pushing the not-quite-state-of-the-art JavaScript GC beyond its level of competency? And in a 32-bit virtual address space?

Let’s analyse a specific example of yours so we can determine what is going on.

Reference counting on the stack might be worth it for robustness (maybe that's why Apple likes it).

Could you unpack for me what you’re thinking here? Why you mention the stack?

1 Pathological cases probably require some way to opt-out of the automated GC and do some manual memory management and/or dictate the use of RC. We should analyse some pathological examples to gain more detailed understanding of the issues.

NodixBlockchain commented 3 years ago

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352243394
Original Date: Dec 17, 2017, 3:41 AM CST


I was contemplating whether there's any justifiable need for supporting stack allocation and lifetimes in addition to the proposal I made for a statically demarcated variant of BG-RC.

W.r.t. to purely stack allocated (i.e. no heap allocation), lifetime checking is only required when references (aka pointers) to stack allocated would be allowed. Note that then would require another distinct static type for references to objects allocated on the stack versus those that will be deallocated by the runtime GC (of which those have two types to demarcate those which employ RC or not). So I'd much prefer we didn't need static allocation on the stack. K.I.S.S. coherent is preferred design. Good design should maximize the benefits with the least number of concepts/complexity.

Afaics, the main performance boost gained from static stack allocation as compared to copy compacting, bump pointer allocation GC (for short-lived objects, with in my proposal the long-lived objects statically typed as RC to avoid the write-barrier performance cost), is that the objects on the stack can be accessed via offsets on the single stack pointer register (e.g. SP on Intel) and they are typically clustered in memory address space to aid in cache coherency performance. The deallocation of objects on the stack by changing the value of the SP isn't a significant performance boost compared to not copy compacting untraced objects in the runtime GC. Runtime GC tracing is amortized over many function/procedure calls of which many/most of them typically leave nothing to trace nor copy compact— i.e. the tracing, copy collection is avoided and thus no significant performance cost.1

But that performance benefit for stack allocation can be mostly eliminated, because for those pass-by-value arguments of a function/procedure which would have been passed on the stack, they could be passed on contiguous region of bump pointer allocated heap of the GC, and then function can be provided a pointer to said region, so that all said objects can be accessed through said single pointer with offsets. For pass-by-reference arguments, the reference will be stored in the said contiguous region and the object will be stored somewhere in the heap, which is again analogous (also in performance) to passing the reference on the stack for stack allocation. IOW, bump pointer allocation with tracing copy compacting deallocation is akin to amortized stack rewinding. We utilize one additional register of the CPU since we still need to keep the SP separately pointing to the call stack. CPU registers aren't as scarce of a resource these days.

I'm failing to find any significant justification for supporting the tsuris of static (i.e. compile-time) stack allocation and lifetimes. I think this proposal of mine will defeat both Rust and Go as the next mainstream, general purpose programming language.

1 The performance cost for mark-sweep GC tracing is on the long-lived objects that have to be traced over and over again, which make asymptotic performance horrible (but my proposal entirely avoids this by proposing that long-lived objects must be statically typed as such by the programmer).


I wrote a highly relevant comment in @NodixBlockchain's thread.

NodixBlockchain commented 3 years ago

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352245289
Original Date: Dec 17, 2017, 4:19 AM CST


The main advantages of stack allocation are, fast allocation (just decrement a pointer), fast deallocation (just increment the pointer), no indirection (we can hold pointers directly into the stack), no pausing (no mark/sweep phase), and zero fragmentation.