leonoel / missionary

A functional effect and streaming system for Clojure/Script
Eclipse Public License 2.0
630 stars 26 forks source link

undocumented topics #12

Open xificurC opened 4 years ago

xificurC commented 4 years ago

If you remove missionary from the example you get the same semantics.

leonoel commented 4 years ago

The point is that you can write asynchronous logic the exact same way you would have written synchronous logic. A synchronous solution could use e.g java.util.function.Supplier to represent a job and the retry logic would be basically the same, substituting (m/sp ,,,) with (reify Supplier (get [_] ,,,)) and m/? with .get.

missionary helps when async is not an option. In that case, properly implementing the retry logic manually (including cancelling propagation) is usually tedious and error-prone, even with higher-level abstractions like promises or channels.

xificurC commented 4 years ago

I might have written too shortly to explain what I meant, sorry about that. What I meant was that after reading the tutorial I saw no difference to a non-missionary solution. An example showing the difference between the missionary and e.g. Supplier solution would make the use-case clearer. If you're able to explain one I can whip up a PR if that'd help you out.

Peter Nagy

On March 2, 2020 4:12:44 PM UTC, "Léo NOEL" notifications@github.com wrote:

The point is that you can write asynchronous logic the exact same way you would have written synchronous logic. A synchronous solution could use e.g java.util.function.Supplier to represent a job and the retry logic would be basically the same, substituting (m/sp ,,,) with (reify Supplier (get [_] ,,,)) and m/? with .get.

missionary helps when async is not an option. In that case, properly implementing the retry logic manually (including cancelling propagation) is usually tedious and error-prone, even with higher-level abstractions like promises or channels.

leonoel commented 4 years ago

I'm not sure to understand which part was not clear to you. The difference between the missionary solution and the supplier solution is the former is non-blocking and the latter is blocking. The syntactic similarity is an explicit goal. If blocking is OK for you, then there's probably no point using missionary in this case.

xificurC commented 4 years ago

How exactly is it non-blocking though? The last example is

(m/? (backoff request delays))
:attempt
:attempt
:attempt
#=> :success

which definitely blocks. backoff returns a task and m/? block the caller's thread to wait for its completion. Where is backoff run though? The caller's thread? On some threadpool? How can we run a thousand of them? How is that different from launching a thousand threads? How is it different from using a ForkJoinPool?

All I'm saying is that the docs fail to mention what are the benefits of using missionary. There's no comparison w.r.t. code "cleanness" or performance. Will missionary still be useful once VirtualThreads (loom, previously Fibers) come out?

leonoel commented 4 years ago

Thank you for this clarification. There's a lot of room for improvement for these topics in the current state of documentation, and I'm planning to write more of it before switching to a production release mode. I'm definitely interested in feedback of this kind to prioritize this work.

The retry with backoff tutorial assumes prior familiarity with the semantics of tasks and sp/?, which are explained in the hello task tutorial. It also assumes the sync/async tradeoff is understood and there's already a lot of material about it on the web. I don't think the retry with backoff tutorial should be moved in any of these directions, as it's really just an example showing how standard control flow works just as usual when you switch to async.

However, many of the questions you asked can be addressed in other documentation pages.

I acknowledge it's unclear which thread runs which code without looking at the implementation, and the documentation should definitely include an explanation of the execution model. In a nutshell, all user-provided code is assumed to be fast so it's always run as soon as possible, without relying on an implicit shared threadpool. As a rule of thumb, the thread running the code is the thread making the code execution possible. The idea is to let as much freedom as possible to the developer to profile and make relevant performance tweaks. For instance, if profiling exhibits an application-space bottleneck on an event dispatcher thread, the expensive code can be moved to an arbitrary executor using via to improve throughput on the dispatcher thread. cpu and blk executors are provided as a convenience for cpu-bound and blocking evaluations, respectively. missionary won't make such optimizations implicitly because the net benefits are too hard to infer in the general case. Apart from cpu and blk, the only thread managed by missionary is an event dispatcher dedicated to sleep tasks.

As missionary is a functional asynchronous programming toolkit, it should be compared to libraries implementing the same paradigm. RX is probably the most popular of them, and I think a guide showing how common RX patterns translate to missionary would make sense. Compared to RX, missionary is stricter about supervision, has out-of-the-box support for lazy sampling, and relies on continuation-based composition instead of monadic operators.

Compared to imperative tech such as core.async or various flavors of promise/future, the main benefit of functional style is that you get supervision for free. The documentation could definitely put more emphasis on what it is and why it's important. Project loom's fibers will be an attempt to add supervision features to an imperative concurrency model, so there will likely be some overlap in the target problem space and I will keep an eye opened on it.

xificurC commented 3 years ago

I'm trying to clear up the incomplete image of this library in my mind. I decided to take your first post (hello task) and walk through it, trying to reimplement tasks with thunks and futures. Here is a walkthrough of that.

If you could comment on it, noting the differences from the actual implementation, or providing a counter-example that cannot be done this way, anything really, I'd be very glad!

leonoel commented 3 years ago

I think your mental model about tasks is the right one. Some remarks about your implementation of join :

Look at this reply if you haven't already, there's significant overlap with your write-up.

Note : there's a minor difference between the future implementation and the one in missionary, which is related to graceful shutdown. Due to the way Future/cancel works, you can't interrupt a running future and wait for the job to be fully completed. To be fully compliant with task semantics, you would have to use plain threads, Thread/interrupt and Thread/join. I omitted this point from my explanation because it's too subtle for an introduction tutorial and not necessary to understand the idea.

xificurC commented 3 years ago

Thank you for your quick response.

it returns results in the order they're available, instead of respecting the order of input tasks.

Yeah, forgot about that

it's suboptimal, the calling thread does active polling so it will be busy for the entire duration of the join.

Your example uses a queue, which I tend to use as well, I just used polling because that was the first thing that came to my mind. This is an off-topic question, but isn't some sort of polling present in the guts of the JVM queues and futures as well? How does a .take() on a queue block? How does a .get() on a future block?

Due to the way Future/cancel works, you can't interrupt a running future and wait for the job to be fully completed

Care to elaborate? I always expected (but never checked) that .cancel(true) on a future calls .interrupt() on the underlying thread. Or are you making a different point?

xificurC commented 3 years ago

Ah, interrupt to send the signal, join to wait for the thread to finish, I get it now.

leonoel commented 3 years ago

There's a big difference between active polling (repeatedly checking some condition in a loop) and waiting for a value on a queue or a future. In the former case, the thread has always something to do so you're burning CPU cycles for nothing. In the latter case, the thread is blocked until some other thread wakes it up. When a thread is blocked, the OS scheduler can assign the CPU to another thread, so resource usage is optimized.

xificurC commented 3 years ago

Wouldn't a Thread.yield() or Thread.sleep(1) in the loop do the same though?

leonoel commented 3 years ago

It may improve efficiency, but still considered a bad practice. The general recommendation is to leverage java.util.concurrent facilities, and refrain to use low-level thread machinery unless there's a good reason to do so.

xificurC commented 3 years ago

I know I know, I'm just wondering how does the low-level thread machinery implement the blocking waits. I guess I'll check the actual impls one day

xificurC commented 3 years ago

I was too curious so I walked down the rabbit hole starting from ArrayBlockingQueue and got this far. park/unpark is being called, but the jdk source for unpark calls some unresolvable method U.unpark(). After more digging I found some platform-dependent unpark functions in the C sources. No unpark in linux, but there is one in posix which essentially works with a pthread mutex. I'd like to know what system calls are being done, maybe another day :)

Sorry if this is getting too off-topic, stop me anytime!

leonoel commented 3 years ago

Another feature that is missing in your join implementation is the ability to react to thread interruption. When a join is interrupted, it must propagate the interruption signal to all child tasks.

xificurC commented 3 years ago

Ah, a nice feature! Thank you for your inputs and for this library