janet-lang / spork

Various Janet utility modules - the official "Contrib" library.
MIT License
122 stars 35 forks source link

Tighter resource control for `sh/exec-slurp(-all)`. Add `stream/lines(-channel)`. #174

Closed amano-kenji closed 8 months ago

amano-kenji commented 8 months ago

the processes spawned by sh/exec-slurp(-all) are properly closed for resource control.

This closes https://github.com/janet-lang/spork/issues/173

stream/lines returns a fiber that yields lines from a stream. stream/lines-channel returns a channel that gives lines from a stream.

amano-kenji commented 8 months ago

I'm working on lower-level sh/line functions that can be stopped.

amano-kenji commented 8 months ago

I think it's ready.

sogaiu commented 8 months ago

@tionis May be you would be interested in taking a look at the proposed changes in this PR?

amano-kenji commented 8 months ago

It seems perfectly functional now, but I wonder whether the first message from a channel returned by lines should be the function that cancels the fiber that feeds the channel and kills the command process. Right now, the function is a part of a struct returned by lines.

A struct as the return value feels a bit clunky, but you can't make mistakes with a struct.

tionis commented 8 months ago

@tionis May be you would be interested in taking a look at the proposed changes in this PR?

Looks fine to me. Why has the docstring so many blank lines, though?

amano-kenji commented 8 months ago

They are paragraphs, but I think I can combine them into fewer paragraphs.

amano-kenji commented 8 months ago

I think I should give frozen processes a chance to shut down gracefully before terminating them forcefully. I'm going to forcefully push soon.

tionis commented 8 months ago

They are paragraphs, but I think I can combine them into fewer paragraphs.

That might look cleaner, be more readable in the repl.

I think I should give frozen processes a chance to shut down gracefully before terminating them forcefully. I'm going to forcefully push soon.

This seems reasonable, the timeout should probably be configurable either via args (should be possible by walking the argument list as until now only strings where valid arguments) or dynamic variable from the environment (more idiomatic and performant but uses a global var)

amano-kenji commented 8 months ago

I think it is ready for review. It is cleaner now. I don't actually have to allow users to configure timeout because the frozen processes are killed asynchronously in a janet task. The timeout is 30 seconds which should be enough.

I tested the code without errors. This is my masterpiece.

amano-kenji commented 8 months ago

My little worry is that another process can take over the PID that was killed with SIGTERM, and kill-proc-async ends up killing it with SIGKILL.

Perhaps, I should just kill the processes with SIGTERM and leave the unresponsive zombie processes?

sogaiu commented 8 months ago

A bit of a hack perhaps, but on Linux, perhaps determining the start time of the process adapting this idea [1] and then acting or not acting accordingly might help?

I guess you'd need to know the PID though...oh, it looks like one might be able to determine that if the process was started with os/spawn. The docstring for os/spawn says:

The returned value proc has the fields :in, :out, :err, :return-code, and the additional field :pid on unix-like platforms.


[1] So parsing /proc/$pid/stat for appropriate info...

amano-kenji commented 8 months ago

Is /proc/$pid/stat specific to linux?

amano-kenji commented 8 months ago

FreeBSD also has /proc/$pid/stat. In both FreeBSD and Linux, starttime has the same position in /proc/$pid/stat, but /proc/$pid/stat is not the same between BSD and linux. OpenBSD doesn't have /proc.

I wish janet proc had starttime.

amano-kenji commented 8 months ago

I think the only portable way to handle this is to kill proc only once with SIGTERM or SIGKILL. SIGKILL doesn't leave a trail, but it doesn't allow proc to shut down gracefully. SIGTERM allows it to shut down gracefully, but it could leave a trail in some cases.

amano-kenji commented 8 months ago

If os/proc-wait throws an error inslurp-exec, how can I differentiate ev deadline error from non-zero exit code error?

I can parse the error text, but that doesn't seem like a stable way to do so.

Edit: I guess I will just have to remove :x flag in os/spawn and handle non-zero exit code myself.

Edit: I guess I will just use SIGTERM... If there are any zombie processes left, users should probably kill them manually.

amano-kenji commented 8 months ago

Because I cannot call os/proc-wait twice or even three times, my options are limited. I guess I'm going to go ahead with SIGTERM and remove :x flag in os/spawn for better handling of non-zero exit code.

amano-kenji commented 8 months ago

After many iterations, I realized killing processes in sh/exec-slurp-(all) isn't a great idea. There is always a tiny chance that os/proc-kill may end up killing another process that takes over the PID of the process started by sh/exec-slurp and sh/exec-slurp-all. It is unlikely to happen, but it can happen in theory.

After the pipewire freezing issue was fixed, I realized the freezing issue has to be addressed in upstream projects, not in spork/sh. Addressing the symptom of an underlying issue is a bad idea. The issues have to be fixed at the root causes.

Thus, I made the pull request simpler. Now, sh/exec-slurp and sh/exec-slurp-all don't try to kill processes upon errors. They just try to close process pipes.

sh/lines also just simply send SIGTERM to the underlying process when :cancel function is called.

I think it's good enough. I am going to take a rest now.

tionis commented 8 months ago

This does seem cleaner and with the removed risk of unexpected killing of processes :+1:

amano-kenji commented 8 months ago

I just realized that os/proc-kill will just throw an error after os/proc-wait finishes.

Before os/proc-wait finishes, the process PID is not actually cleaned up from the operating system.

Thus, it's impossible for os/proc-kill to kill a random process that takes over the PID of the process.

Once os/proc-wait starts, even if ev/with-deadline cancels it, it is going to finish in the background. I tested it.

So, it's actually safe to call os/proc-kill anywhere.... as long as it's wrapped in a try block that discards any error.

I'm going to have to rethink this today.

amano-kenji commented 8 months ago

I decided not to make sh/slurp-exec and sh/slurp-exec-all responsible for killing the spawned program.

I just made :cancel function for sh/lines handle errors more gracefully. I don't want to make further modifications unless they are necessary.

Ready for review and merge.

amano-kenji commented 8 months ago

I believe It's a correct behavior for sh/slurp-exec and sh/slurp-exec-all to not kill their processes upon errors.

If I cancel os/execute with ev/with-deadline, os/execute doesn't kill its process.

I can think of sh/slurp-exec and sh/slurp-exec-all as more convenient versions of os/execute. Users who want more control should use os/spawn directly.

amano-kenji commented 8 months ago

I think I finally streamlined this pull request...

sogaiu commented 8 months ago

Very minor point, but is this do needed?

bakpakin commented 8 months ago

LGTM, thanks for all the work on this, @amano-kenji.

@sogaiu I'm not to concerned with minor style points on little things, esp. since this should generate identical bytecode.

amano-kenji commented 8 months ago

Very minor point, but is this do needed?

That is a minor mistake...... I didn't spot that....

sogaiu commented 8 months ago

Seem to be no harm done :+1:

amano-kenji commented 8 months ago

Thanks, bakpakin.