Closed ylluminate closed 3 years ago
I very very much want to see channels in V! They're one of the best ideas in Go; they're both simple and powerful, which is rare.
@ylluminate See related discussion: https://github.com/vlang/v/issues/1868
@elimisteve It's there. Almost all high level requirements are taken from Proper support for distributed computing, parallelism and concurrency published on github.
@ylluminate @cristian-ilies-vasile @elimisteve thank you for summarizing the high level goals/requirements - they're pretty much what I envision :wink:. I'll keep an eye on this to see how will everything evolve.
An amendment:
the green threads monitor/scheduler will be part of the language
should read:
the green threads monitor/scheduler will be part of the language, but will be extensible by the programmer to allow changing the default scheduling behavior
@dumblob I updated the document shared on google drive with your comment.
https://golang.org/doc/go1.14#runtime Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection.
@cristian-ilies-vasile that's definitely interesting (see specification), but it's still by far not fully preemptible (therefore it's called asynchronously preemptible, because it actually still just inserts safe points, but now starting from go 1.14 they'll be inserted in many more places but still carefully enough to not increase the overhead much - actually they tuned it so that the overhead is in majority of cases not measurable - though in some cases the overhead seems to be enormous as seen from this benchmark and it's equivalent in plain C which doesn't suffer from any such bottlenecks).
It's also good to note here, that the implementation of "asynchronously preemptible" in go is really tricky and is definitely not universally applicable (hehe) to other languages (e.g. because of the problem on Windows which they kind of "solved" for go semantics and internals).
Though I generally like the go design with "safe points" - I deliberately didn't go into details like this when describing the "interleaving between concurrency and parallelism" as outlined in https://github.com/vlang/v/issues/1868#issue-489433130 because it's a lot of work with "little benefit" (as you see even go lang tackles preemptiveness - and to date still just partially - first now after many years of development by hundreds of engineers with many of them full time go devs).
Not sure if this adds any value to this conversation but ScyllaDB is build on top this async framework https://github.com/scylladb/seastar There is also this Redis clone that uses it https://github.com/fastio/1store
Maybe instead of fibers etc. this kind of library solution could be used and added to vlib?
Regarding structured concurrency, I would suggest reading https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ first, which explains reasons to use structured concurrency instead of go-style concurrency.
@pquentin thanks for the pointer - didn't know about the Trio concurrency library (and the "nursery concept"). It's a very good approach to concurrency programming indeed. In the context of V I have these observations:
I'm curious how long running "background" tasks would be implemented in V as authors of the nursery concept recommend using yield
to make a generator, but there are no generators/yield
in V
Assuming my proposal gets implemented in V, nursery objects might be "simulated" using the pluggable scheduler from my proposa. The scheduler state can be tinkered with, i.e. read from and written to, any time pretty much as nursery objects has to be passed around though it'll be less convenient to traverse the one global graph in the scheduler state than to have explicit nursery object from each "split" where spawning happened. That said my proposal allows for cheap abstraction in form of an additional API modeled after nurseries, but doesn't enforce it.
I think the only thing missing is to specify, that the pluggable scheduler has to handle "exceptions" (i.e. optionals "raised" by return error()
and panics "raised" by panic()
if panic()
will stay in V). @cristian-ilies-vasile could you please add to your overview?
Care would have to be taken also when implementing the automagical (de)spawning of reentrant go routines
together with nurseries as that actually means dynamically at arbitrary times change the nursery objects (maybe making nursery objects just "live views" of the pluggable scheduler state?).
One thing to contemplate is enhancing go routine
s to immediately return something. E.g. a handle which could then provide methods like value<t>() t
which would be a blocking method returing exactly what the routine would return if it wasn't executed with prepended go
(including optionals, of course), running() bool
which would check in non-blocking manner whether the routine already finished, wait( s float )
to wait for completion with a given timeout, etc. Such handle would need to support reentrancy though, so the methods above would need to distinguish which instance we're talking about :wink:.
This might ease an implementation of the nursery concept on top of the proposed primitives (with the chance of them getting included into V standard library and maybe even as built-in).
@pquentin Interesting concept, but preventing goroutines from being spawned without permitting execution of the main thread/goroutine is extremely limiting and would prevent a huge percentage of the value gained from having them in the first place.
The argument made against goroutines is honestly a severe straw man, as the normal and common way to achieve a nursery-like pattern in Go is to use a WaitGroup, or to pass a "done channel" (done := make(chan struct{})
) to each function or method spawned in a goroutine, enabling each to signal when it's done doing its work.
The fact that general solutions exist in the ecosystem is eventually admitted in the middle of the 4th footnote:
[Edit: I've also been pointed to the highly relevant golang.org/x/sync/errgroup and github.com/oklog/run in Golang.]
That said, the point about stack traces only going back to the point where the goroutine was launched, rather than going all the way back to main
, is an interesting one I hadn't realized :+1:.
I disagree. The argument against unstructured Go is not a straw man, just like the arguments against "goto" a generation years ago weren't straw men. Enforced structure allows new ways of reasoning about your code, thus enabling you to code the Happy Eyeballs algorithm in 40 lines of code instead of 400. This directly translates to fewer bugs.
Yes, cool, the ecosystem has solutions, but if it's way easier to not use them (e.g. because they require a heap of boilerplate in every function call, can't handle cancellation, and whatnot) they're ultimately not helpful. In Trio, "waiting for your child tasks" and "leaving the scope of a nursery" is exactly the same thing and requires zero lines of code (just the end of a block, in Python that's a de-indent), which again helps reduce the bug count and reduces the cognitive load on the programmer.
@dumblob The yield
thing is just a convenient wrapper to package "run some code that returns a value, SOME_CODE, run some more code to clean up no matter how SOME_CODE exited" in a single function. The caller says with wrapper() as value: SOME_CODE
, and Python guarantees that the cleanup code always runs when you leave the scope of that with
block.
In current V (or Go) you'd use a defer
block for cleanup, but that's potentially buggy – you must never forget the defer
line, otherwise you have a resource leak, an unreleased lock, or whatever. Clean-up should be encapsulated in the object – the caller (i.e. the person writing the calling code) shouldn't have to think about whether, let alone how, to do it. Contrast Python's
with open("/some/path") as file:
write_to(file)
with
file := os.open('/some/path')
defer { file.close() }
write_to(file)
IMHO that second line shouldn't exist. Not for files, not for locks, and not for nurseries.
In Go, starting a goroutine is as easy as it gets – but cleaning it up is not and needs to be done manually, esp when error handling is involved. Making coroutine creation slightly more difficult (by requiring a scoped nursery object to attach them to) is a no-brainer when the dividend you get from this is trivial cleanup.
@elimisteve
preventing goroutines from being spawned without permitting execution of the main thread/goroutine
I don't understand that sentence.
In current V (or Go) you'd use a
defer
block for cleanup, but that's potentially buggy – you must never forget thedefer
line, otherwise you have a resource leak, an unreleased lock, or whatever.
Exactly that's the reason why I'm curious how would we implement the long running "background" tasks using the nursery concept in V as there are currently no means in V guaranteeing the existence of a proper defer
. And having no guarantees totally defeats the purpose of the whole nursery concept :wink:.
Having no guarantees is detrimental to a whole lot of other constructs too. Locks for instance, or not leaving a file handle open when I'm done with using it.
So that's not an argument against nurseries, that's an argument for statically-bounded-visibility of objects – with a destructor that is guaranteed to be called exactly once. As V doesn't seem to have that concept, perhaps the first step should be to add it.
Having no guarantees is detrimental to a whole lot of other constructs too. Locks for instance, or not leaving a file handle open when I'm done with using it.
I don't think it's comparable. I argue, that everything from unsafe {}
in V (e.g. locks) can be easily avoided by other safe constructs V offers and thus I won't discuss nor assume that in this discussion (that's the whole point of existence of unsafe{}
in V). What's left? Not much. Actually I think only files and then a group of infinite number of semantically high-level cases which we can safely ignore in this discussion, because they're not solvable by any technology, but only by the programmer.
Regarding files, yes, that's undoubtedly a hole in the safety. On the other hand I argue, that open files (be it virtual unix files or any other files) are by far not that detrimental as concurrency issues.
@medvednikov what about putting file handling also into unsafe{}
? It's not that ridiculous as it sounds as file system tree hierarchy is getting slowly "replaced" by other DB-like interfaces - especially in the context of Web (WebAssembly as well as WASI is agnostic to the concept of filesystem hierarchy and support for filesystem hierarchy is just plain optional module pretty much like any other DB API module etc.).
In summary, implementing a destructor-like concept in V wouldn't IMHO solve anything (there have been several discussions in this issue tracker and elsewhere - feel free to join them as this thread is about something else).
Back to the topic. Because I like the nursery concept, I'm asking for ideas how to implement the "background" long running tasks with guarantees, but without introducing yield
in V and a bit more syntactically readable than https://godoc.org/github.com/oklog/run#example-Group-Add-Context (assuming larger graph of such "actors"). Anyone?
@dumblob initial requirements the scheduler could start a new instance of a green thread in order to accommodate load bursts (elastic computing) comments on https://discordapp.com/channels/592103645835821068/592842126304477184 o the green thread monitor will spawn a 2nd consumer if first cannot cope with the load. o how exactly will the scheduler/monitor know what to spawn? If it needs to do that in a general way, does that mean that you would have to register a custom spawning function to each channel? Also what would happen, if the system does not have enough resources for the new consumer?
- how exactly will the scheduler/monitor know what to spawn?
The scheduler is pluggable, so it's up to the programmer. If you're asking for defaults, then I'd follow the principle only good defaults determine success. So I'd say we really want to provide basic scaling ("elasticity") by default, so we should provide some scheduler working moderately well for common scenarios (e.g. average destop PCs, average laptops, average smartphones, average small/medium servers).
If it needs to do that in a general way, does that mean that you would have to register a custom spawning function to each channel?
I think the default behavior (as mentioned above) should be without any custom spawning function - the scheduler will first look whether the go routine is "eligible" for elasticity (I envision this through the reentrancy flag) and then will look for every channel (the ring buffer) used in the go routine which is also used in other go routine (thus "connects" them while forming dataflow programming pattern - imagine e.g. NoFlo or Node-RED) and then mux (multiplex) the channels (inputs & outputs) accordingly.
Also what would happen, if the system does not have enough resources for the new consumer?
By default I wouldn't do anything (except in debug mode I'd log this information if it wasn't logged before in the last minute or so).
@dumblob consider a situation, where due to a temporary peak in load, it cannot keep up, so a channel/buffer/queue is filled to the max. The scheduler decides to launch a new worker, which increases the load even further, and so on and so forth => Things come to a grinding halt.
Adding a positive feedback loop, should not be done without understanding its limits and the behavior of the system in the degraded cases. I personally think, that it can not be done in the general case. I agree that this kind of elastic scaling may be useful for some cases, where you know that the feedback will be limited by external factors, but I think the default should be to do nothing, except may be log that the buffer was found to be full, and do not try to add new consumers automatically.
consider a situation, where due to a temporary peak in load, it cannot keep up, so a channel/buffer/queue is filled to the max. The scheduler decides to launch a new worker, which increases the load even further, and so on and so forth => Things come to a grinding halt.
This is just one of many situations. The default scheduler should be pretty conservative. There are also many other measures the default (i.e. a simple one) scheduler can (shall) take into account - but in this case it's quite easy as the scheduler knows everything about the whole graph (unlike in many other technologies where one doesn't know the whole state) and connections between nodes - imagine you know the load of all buffers thus the min/max flow between any two nodes (preferably a go routine instance acting as pure source(s) and a go routine instance being a pure sink(s)) is easily observable and thus the whole system perfectly tunable as the underlying graph theory and approximation algorithms are out there.
My hint to use full buffers as an important information for the scheduler was meant primarily as a trigger for reevaluating the situation, not as the only measure :wink:.
Adding a positive feedback loop, should not be done without understanding its limits and the behavior of the system in the degraded cases. I personally think, that it can not be done in the general case. I agree that this kind of elastic scaling may be useful for some cases, where you know that the feedback will be limited by external factors, but I think the default should be to do nothing, except may be log that the buffer was found to be full, and do not try to add new consumers automatically.
That's the sole reason why reentrant go routines exist. Only reentrant go routines will take part on elasticity - any other instance of a go routine (i.e. non-reentrant instance) will not get any additionally spawned/stopped instance from the scheduler. So it's again 100% in the hands of the programmer and V in that case doesn't handle a general case you seemed to be afraid of :wink:.
To clarify, I'm not proposing having a cool conservative scheduler supporting elasticity already in V 1.0. The reentrant go routines can be implemented the very same way as non-reentrant ones (i.e. without elasticity) for half a decade and it'll be fine. What I'm proposing though is the semantics, which I find important for now and the upcoming decades and which thus must be part of V 1.0.
discord quote how exactly will the scheduler/monitor know what to spawn? If it needs to do that in a general way, does that mean that you would have to register a custom spawning function to each channel? Also what would happen, if the system does not have enough resources for the new consumer? @dumblob I think that could be done in a transparent way for coder. The green thread monitor could know which GT (green thread) is feed by which channel. At every preemptive allocation/reallocation the monitor could compute few smart statistics and check that the FIFO grow at a higher rate than the GT could process. the GT should have an option indicating that the coder will allow GTM (Green Thread Monitor) to scale automatically the load or not.
@cristian-ilies-vasile that's basically how I envision that (though not that simplified :wink:) with the only difference, that you've split my concept of a general scheduler into two parts - a scheduler and a separate monitor (I'd though rather leave them technically together as both work on the very same data and splitting them will undoubtedly lead to performance decrease).
@dumblob No, the GTM (green thread monitor) contains the scheduler. In fact is/will be the same piece of code with 2 definitions attached. I placed a sketch of testing concurrency document on the shared folder https://drive.google.com/drive/folders/1LwsfHBlEdOEIf2IT7SjijKoJic7mN6Zj
Understanding Real-World Concurrency Bugs in Go https://songlh.github.io/paper/go-study.pdf
No, the GTM (green thread monitor) contains the scheduler. In fact is/will be the same piece of code with 2 definitions attached.
Ok, I'm fine with that :wink:. One minor thing is though the naming - I'm deliberately trying to not use any terminology resembling cooperative multitasking (e.g. "green thread"), because it's highly misleading (for the future as well as for the current state of things) as it implies many guarantees which the full preemption which I advocate for in V can not offer.
@cristian-ilies-vasile could you rename the Green Thread Monitor to something way more generic? Maybe V Thread Monitor (and V Thread Scheduler for the scheduler API) or so?
Understanding Real-World Concurrency Bugs in Go https://songlh.github.io/paper/go-study.pdf
Yeah, that's a good source of issues with the Go semantics (especially the behavior of Go channels to be precise). I think V can learn from that (though many of their decisions have been done due to performance reasons) and improve on that - the full preemptiveness I'm strongly suggesting for V should help with some of them too.
Btw. for Go is this paper not any more valid after introducing the non-cooperative goroutine preemption in Go 1.14.
OK, here are new descriptions:
V Light Thread Monitor V Light Thread Scheduler - for the scheduler API
Seems that AI could help with subtle deadlocks errors. https://medium.com/deepcode-ai/deepcodes-top-findings-9-deadlocks-836f240ad455
Like a say on discord It can be fun to implement a kind of distributed dataset for multi-thread operations. In a far future
https://github.com/microsoft/verona/blob/master/docs/explore.md
Concurrent Ownership In Project Verona, we are introducing a new model of concurrent programming: concurrent owners, or cowns for short. A cown, pronounced like "cone", encapsulates some set of resources (e.g. regions of memory) and ensures they are accessed by a single thread of execution at a time.
@cristian-ilies-vasile yep, I've already seen Verona and this principle. IMHO it's build upon similar ideas as in the nursery concept discussed above (but with different syntax and focusing on data - in terms of individual variables - instead of the concurrency graph).
I find the concept neat though there are also doubts:
@dumblob How much overhead in terms of kilobytes/megabytes a multi threading run time will add to binary compiled code.
How much overhead in terms of kilobytes/megabytes a multi threading run time will add to binary compiled code.
Assuming dynamic libraries (on Windows DLLs and on nix systems pthread), then V itself will be just few kbytes bigger than without a multithreading runtime. For static build, that'll be a different story as e.g. pthread*s are quite big. For -bare
build, it'll depend mostly on the capabilities of the underlying HW (MCU, uC, ...), but that's usually very small, so again at max few kbytes.
Generally I don't feel this is of a concern (there are way bigger modules/parts of V). Is there anything specific you had in mind @cristian-ilies-vasile ?
I was thinking that the overhead will be larger, few megabytes, but if would be around 500 Kbytes it is OK from my side. "Is there anything specific you had in mind" We have this discussion here but I am not quite sure if this type of concurrency, with a dedicated run time will be accepted as part and parcel of the language (like in GO).
We have this discussion here but I am not quite sure if this type of concurrency, with a dedicated run time will be accepted as part and parcel of the language (like in GO).
Well, from my point of view, there is basically no runtime for this. Moreover the scheduler, which will be basically the only runtime, is pluggable and the default one should fit under 1000 SLOC with flying colors, so it's really tiny. It's similar to Go - there the only big part of the runtime is the garbage collector, but V has none, so no need to worry :wink:.
The idea is to pursue your dream, isn't it? How Not to Land an Orbital Rocket Booster
@cristian-ilies-vasile I don't follow.
Mind it's not a rocket science (though thanks for the rocket icon under my post above :wink:) - historically there has been thousands of programmers writing more or less advanced schedulers (imagine all those kernels, microkernels, exokernels, etc., network device firmwares, HPC apps, language VMs etc.).
It's really ubiquitous (and many of these schedulers are pluggable - if you're running Linux, you can try it out immediately and choose among tens of schedulers for different subsystems on the fly back & forth).
State of Loom (Java) http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.html
@wanton7 cool, even Java world moves slowly forward ;) Note though, Loom is a nearly perfect copy of what the modern Go threads do (see https://github.com/vlang/v/issues/3814#issuecomment-592643754 above) with the same pros & cons.
Pros are mainly high performance and quite good scaling. Cons are mainly about "it works well only in pure Go apps which do not use/call anything from the outside world". It's because both Go and Loom try to get away from time sharing at all cost (their schedulers don't do time-based preemption at all) which is doable only and only if you have 100% control over the source code or byte code of your application. Which would mean to not use any non-pure-V library in the future which is insane IMHO.
Thus in my proposal I'm sacrificing a tiny bit of performance just to allow seamless and totally troublefree use of any non-pure-V library directly from V.
Note also we're talking about V language semantics - the implementation under the hood might change every day if needed and due to a pluggable scheduler, the programmer is actually free to implement her own scheduler possibly immitating Go/Loom behavior if she is confident her app doesn't use any non-pure-V libs.
Fairness in Responsive Parallelism https://dl.acm.org/doi/pdf/10.1145/3341685
@dumblob It was a conversation today on discord channel related to various V's things, and we can start working on coroutines ahead of scheduled V 0.3 version!
From Folklore to Fact: Comparing Implementations of Stacks and Continuations https://kavon.farvard.in/papers/pldi20-stacks.pdf
https://github.com/hnes/libaco
https://github.com/baruch/libwire
https://github.com/halayli/lthread
https://github.com/Tencent/libco
https://byuu.org/library/libco/
libco is a cooperative multithreading library written in C89. https://byuu.org/projects/libco
Protothreads are extremely lightweight stackless threads designed for severely memory constrained systems http://dunkels.com/adam/pt/ https://en.wikipedia.org/wiki/Protothread a c continuation library inspired by Adam Dunkel's ProtoThread(With malloc) https://github.com/matianfu/FUNK Protothreads (coroutines) in C99 https://github.com/zserge/pt c-block are extremely lightweight macros designed for eliminate callback hell, inspired by Duff's device and Protothreads. c-block provide sequential flow of control that is similar to the await of c# https://github.com/huxingyi/c-block
lthread is a multicore/multithread coroutine library written in C https://github.com/halayli/lthread/blob/master/docs/intro.rst
An Implementaion of Coroutines for C https://github.com/spc476/C-Coroutines
libdill: Structured Concurrency for C http://libdill.org/documentation.html
Fibers: the Most Elegant Windows API https://nullprogram.com/blog/2019/03/28/
o debugging / visualizations of Light Threads Graph, events, states and actions. I remember that Mars Rover has suffered serious glitches due to so called "priority inversion" bug [1], [2]
I think that for testing/validating phases we should be able to obtain these details in a human readable layout. For few items the .dot language suffice (https://www.graphviz.org/), but for large graphs is ineffective.
Other options are //1 use the railroad diagrams https://www.bottlecaps.de/rr/ui https://github.com/GuntherRademacher/rr
//2 use graph visualization application able to deal with thousands of items. https://gephi.org/
[1] https://www.rapitasystems.com/blog/what-really-happened-to-the-software-on-the-mars-pathfinder-spacecraft [2] https://www.slideshare.net/jserv/priority-inversion-30367388
An interesting paper on how to deal with saturated locks. Generic Concurrency Restriction https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:1078
An interesting share by @cristian-ilies-vasile:
http://uu.diva-portal.org/smash/get/diva2:1363822/FULLTEXT01.pdf (backup)
In summary from their Comparison section (8.3 Result):
Encore
Pony
Rust
Encore
Pony
Rust
Concurrency Kit Concurrency primitives, safe memory reclamation mechanisms and non-blocking data structures for the research, design and implementation of high performance concurrent systems. http://www.concurrencykit.org/
Lockfree Algorithms Design and implementation of scalable synchronization algorithms http://www.1024cores.net/home/lock-free-algorithms
Liblfds, a portable, license-free, lock-free data structure library written in C. https://www.liblfds.org/
Speaking of C/C++ libs, I'd recommend rather newer stuff - IMHO the most performant and at the same time easiest to use is Weave. It's even undeniably faster than e.g. multithreading libraries hand-tuned by many Intel devs (who work on them for more than a decade). It also scales much better than any other multithreading and HPC lib I know of.
And the best of it? Weave has no platform-specific hacks (neither code nor assembly nor any restrictions), so it performs that well on all exsting platforms supported by a C compiler.
Because Weave is written in Nim, you can easily get its C source or manually transpile it to V (Weave uses quite some metaprogramming Nim offers, but because we're talking built-in preemptive "subthreads" here for V, it should translate to "compiler magic" because that's V's answer to metaprogramming).
Weave uses the newest research findings and has many empirically designated constants fitting current & near future mobile, desktop and HPC systems. Therefore it's the most performant multithreading library and at the same time has probably the smallest code base among all these libs.
I think V could copy the SPSC and MPSC channels' (ring buffers') implementations as well as some constants and ideas from the work-stealing runtime Weave implements. Those are the core parts designating how the whole system will perform.
V Channels @krolaw has published the first version of Go like channels and selects for V. https://github.com/krolaw/vchan
https://preshing.com/20120612/an-introduction-to-lock-free-programming/ Another introduction of lock-free-programming to get a feel for it.
@dumblob you seem up-to-date with libs in this space. It seems like the atomic features are needed for lock free things to work. I am just doing some proof of concepts for basic stuff like increment, decrement numbers. Think I got most covered using https://gist.github.com/nhatminhle/5181506 but I am trying to get a solution for TCC using atomic operations. Can you or someone else put me in the right direction? Thinking if we build a good foundation of these function that we can build on-top-of later
Transferred from a document posted here in these documents by @cristian-ilies-vasile:
V concurrency high level design
After a carefully consideration of proposed concurrency models and suggestions expressed on discord v-chat channel the high level design is based on GO language model (message passing via channels) which in turn is a variation of communicating sequential processes.
I did read papers on actor model but seems that coders do not use all the primitives provided by language and resort to threads and queues (see Why Do Scala Developers Mix the Actor Model paper).
Almost all high level requirements are taken from Proper support for distributed computing, parallelism and concurrency published on github.
Because there are many words used more or less interchangeably like coroutine, goroutine, fiber, green threads etc I will use the term green thread.
Key points
Open points