Hanging process - Githubissues

bsless commented 3 years ago

I played around a bit with similar implementations and I think I hit a conceptual snag - the process can simply hang parking on a put to a full channel or take from an empty channel. I'm not sure about the semantics yet, but I think the only correct solution is to narrow the scope a bit and always alt between an input or output channel and the control channel.

wdyt?

raspasov commented 3 years ago

I wrote in the README:

A ss.loop/go-loop always exits on the very next (recur ...) call. It does not "die" automagically in the middle of execution.

IMPORTANT: if the first loop is stuck waiting in its own code, say via (<! ...), there's no guarantee that it will be stopped before the second loop begins.

IMO doing more would be expanding the scope. Perhaps one can imagine a solution via a macro that rewrites all blocking (>! ...), (<! ..), etc calls to be via alts!, etc. But that definitely gets more complex AFAICT.

Having the loops be restartable given random code is tricky (which is what this library is trying to do). There's a multitude of ways a (go-loop ...) can be blocked/parked. I thought that only restarting on the next (recur ...) is a reasonable compromise.

In order for that compromise to work, you do need to ensure that your code does not block indefinitely. That means using timeouts, alts!, etc in the code that you supply to the go-loop. That will allow a specific go-loop to be stopped on the very next recur.

Also important:

... there's no guarantee that it will be stopped before the second loop begins.

That means that a 2nd go-loop with the same :id starts right away. It doesn't wait for the 1st one to exit. The 1st one will stop on the very next recur, as long as the loop isn't blocked indefinitely.

Thoughts?

bsless commented 3 years ago

Everything you wrote there makes sense. A term-rewriting macro is actually not a bad idea given that all puts and takes already have to be explicit in go blocks. I think trying to solve the most generic case is difficult and maybe addressing the 90% of other cases is good enough:

a loop consuming from a channel and performing some side effect, maintaining internal state
a loop producing to a channel and performing some side effect, maintaining internal state
a loop which does both

I started experimenting and this is a rough sketch I've come up with so far:

(defn server-process
  [in f init ^AtomicBoolean mail? control]
  (a/go-loop [f f
              state init]
    (if (.get mail?)
      (let [[f state]
            (loop [f f
                   state state]
              (let [c (a/poll! control)]
                (if (nil? c)
                  [f state]
                  (let [[f state] (c f state)]
                    (recur f state)))))]
        (.set mail? false)
        (if f
          (recur f state)
          nil))
      (let [v (a/<! in)]
        (if (nil? v)
          nil
          (recur f (f state v)))))))

(defn migrate
  ([state-fn]
   (fn [f state]
     [f (state-fn state)]))
  ([fn-fn state-fn]
   (fn [f state]
     [(fn-fn f) (state-fn state)])))

(defn restart
  ([init]
   (fn [f _state]
     [f init]))
  ([f init]
   (fn [_ _]
     [f init])))

(defn effect
  [f]
  (fn [g state]
    (f state)
    [g state]))

(defn stop [] (fn [_ _] [nil nil]))

The commands are functions which are passed over a channel to the process. This still does not solve the hanging process but that can be addressed with alt without impacting correctness or a rewriting macro. Using alt can also eliminate the need to check for mail and enter the inner loop, just use priority. It feels like I'm reinventing OTP, though. I think we don't want to create a leaky abstraction. If I have to think about the wrapping listener, to make sure I don't block indefinitely, modify my code to use alts with timeouts, etc, it's leaky. Whatever solution we arrive at shouldn't be brittle.

raspasov commented 3 years ago

I did some mental exploration in the code below via a sample (not actual macro code). I thought I might have something but things do get leaky fast.

Take one simple example where you wait via a (<! ...) form. In this case, you might be able to use a macro to replace that with an (alts! ...) form and be done with it.

But once you start considering a more complex example, things are not so straightforward. Assume some code that takes via (<! ...) and does something with it. Random user code might have some assumptions about the value that comes out of (<! ...). For example, they might assume that the value is never nil, because the channel never gets closed. Now, all of a sudden, if we re-write that (<! ...) to be (alts! [stop-ch ...] ...), we might cause the user code to throw an exception or not work in some subtle way.

One even crazier idea which I haven't explored much is to wrap every single form with an if check and somehow (keyword, somehow, I'm not sure that's possible on the JVM or JavaScript in a clean way) stop all further execution.

I have never used a language that supports first-class continuations, but this last idea feels like re-inventing them inside a macro (possibly not a great idea :) ) https://en.wikipedia.org/wiki/Continuation

All of this further supports the original idea of only stopping the loop on (recur ...). You can think of (recur ...) as a "safe checkpoint" where you're able to interrupt execution in a clean way without too much complexity.


;Original code
;--------------------------
(go-loop
  ;Sample user code start
  ;------------------------
  [x 1]

  ;Macro needs to "re-write" this user code form, start
  (a/<! (a/chan))
  ;Macro needs to "re-write" this user code form, end

  ;This original code will never reach here
  (println "x:::")
  (println x)
  (recur (inc x))

  ;Sample user code end
  ;------------------------
  )

;Macro output code
;--------------------------
(let [stop-ch (filter (fn [x#] (= x# :stop)))]
  (go-loop [x 1]

    ;Macro "re-write" output, start
    (let [[ret which-chan] (a/alts! [stop-ch (a/chan)] :priority true)]
      (if (= which-chan stop-ch)
        ;Stopping the loop
        ;return nil out of the re-written part?
        ;But what if the user code form actually does something with the result and never expects nil?
        ;Things get complicated and pretty leaky...
        ;Automagic re-writing of random code is hard.

        ;return
        nil

        ;else, return the user value
        ret
        ))
    ;Macro "re-write" output, end

    (println "x:::")
    (println x)
    (recur (inc x))))

raspasov commented 3 years ago

Perhaps, if we make it very clear what the macro is going to do, that it is going to replace blocking (<! ...), (>! ...) with (alts! ...), then it is not the worst idea. It will automagically make any potentially infinitely blocked (<! ...) or (>! ...) code stoppable from the outside without too much ceremony.

It will be the user code's responsibility to deal with nil values coming out of the channels in the (<! ...) case.

In the (>! ...) case, where we interrupt that (>! ..), it should probably return "false":


(a/go
  (let [ch (a/chan 1)
        ret (a/>! ch :hello)]
    (println ret)))
;=> true

(a/go
  (let [ch (a/chan 1)
        _  (a/close! ch)
        ret (a/>! ch :hello)]
    (println ret)))
;=> false

saberstack / loop

Hanging process #1