janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.57k stars 229 forks source link

Zombie processes freeze spawned process streams and `os/proc-wait` for ever. #1382

Closed amano-kenji closed 10 months ago

amano-kenji commented 10 months ago

I execute (ev/read stream :all) many times simultaneously in at least 6 threads.

If I don't give ev/read a definite timeout, some threads freeze for EVER.

If I give it a timeout, it gives me a result after timeout seconds...

Since I'm using threads anyway, I can use janet-sh by andrewchambers which works flawlessly because it is synchronous. However, I was trying to avoid relying on third party libraries. And, andrewchambers doesn't seem to have time to work on his janet libraries.

This looks like one of race conditions produced by threads...

bakpakin commented 10 months ago

Are you call (ev/read stream :all) on the same stream multiple times? Again, without code it's really hard to debug these kinds of issues or get important details.

amano-kenji commented 10 months ago

To reproduce the issue exactly, you should have pipewire and wireplumber.

I can give you the code, but the environment won't be the same without these.

I call (ev/read stream :all) on a different stream..... in each thread.

Do you understand basic concepts of pipewire and wireplumber as a user?

amano-kenji commented 10 months ago

I think ev/take or ev/give freezes for ever, too.

Can you set up wireplumber and pipewire? Can you also set up i3 or sway?

amano-kenji commented 10 months ago

Okay, this is actually related to the behavior of pw-dump during the shutdown of pipewire and wireplumber.

My suspicion is that pw-dump freezes ev/read on the standard output stream of os/spawn if pipewire and wireplumber are starting up or shutting down.

pw-dump seems to even freeze janet-sh... if pipewire and wireplumber are starting up or shutting down.

There is no other place that seems to be freezing. ev/give and ev/take are fine.

At least, ev/read comes with a timeout, but janet-sh doesn't have a timeout, so janet-sh will freeze for ever.

amano-kenji commented 10 months ago

This one seems to also freeze when pipewire and wireplumber are shutting down.

(defn running?
  "Returns true if pipewire is running"
  []
  (try
    (let [proc (os/spawn ["pw-cli" "list-remotes"] :p {:out :pipe :err :pipe})]
      (= (os/proc-wait proc) 0))
    ([_] false)))

I think this is the very function that freezes my program. If pw-cli list-remotes becomes a zombie process, os/proc-wait is going to wait for ever.

amano-kenji commented 10 months ago

Replacing the above running? function with the below function forced it to return false if pw-cli list-remotes takes longer than a second.

(defn running?
  "Returns true if pipewire is running"
  []
  (try
    (let [proc (os/spawn ["pw-cli" "list-remotes"] :p {:out :pipe :err :pipe})]
      (ev/with-deadline 1
        (= (os/proc-wait proc) 0)))
    ([_] false)))
amano-kenji commented 10 months ago

pw-dump can also be forced to return within 1 second with this.

(defn dump
  "Dumps information on a node. node is either node.name or node.id"
  [node]
  (try
    (let [proc (os/spawn ["pw-dump" node] :p {:out :pipe :err :pipe})]
      (get (json/decode (ev/read (proc :out) :all nil 1))
           0))
    ([_] nil)))

Notice

(ev/read (proc :out) :all nil 1)
amano-kenji commented 10 months ago

I feel much better after finding and destroying zombies.

Is this the right way to deal with zombie processes? Or, should janet improve its handling of zombie processes?

amano-kenji commented 10 months ago

I'm closing this to create another one.