File descriptors get passed down to ‘system’ child processes

Rinzwind commented 1 year ago

Child processes created through #system: or #resultOfCommand: keep open copies of the file descriptors of the Pharo VM process. In the following example, this poses a problem: the second send of #bindTo:port: signals a SocketError (“Address already in use”) because the ‘sleep’ process still has the socket open. The third send does not signal an error because by then the ‘sleep’ process has exited.

socket1 := Socket newTCP.
socket1 bindTo: NetNameResolver localHostAddress port: 1701.
LibC system: 'sleep 2 &'.
socket1 close.
socket2 := Socket newTCP.
error := [ socket2 bindTo: NetNameResolver localHostAddress port: 1701 ] on: Error do: #value.
(Delay forSeconds: 3) wait.
socket2 bindTo: NetNameResolver localHostAddress port: 1701.
socket2 close.

The ‘sleep’ process’s open file descriptors can be listed through ‘lsof’ as done in the following example. The file descriptors include the one for the socket, as well as ones for the Pharo sources and changes files (from #sourcesFileStream and #changesFileStream):

socket1 := Socket newTCP.
socket1 bindTo: NetNameResolver localHostAddress port: 1701.
pid := LibC resultOfCommand: 'sleep 2 >/dev/null & echo $!'.
socket1 close.
lsofOut := LibC resultOfCommand: 'lsof -P -p ' , pid.

There is a flag (FD_CLOEXEC) that can be set on a file descriptor (through fcntl) such that it will not remain open in the ‘sleep’ process. In Zinc issue #110, I gave a snippet that sets the flag on the Zinc server socket. Perhaps that should be generalized to a method for setting the flag on any file descriptor. Alternatively, there should be a variant of #system: that takes an explicit file descriptor mapping for the child process, closing any file descriptor in the child process that is not mapped, something like the following:

outputStream := '/tmp/output' asFileReference writeStream.
otherStream := '/tmp/other' asFileReference readStream.
LibC system: command fileDescriptors: {
    "FD 0, a copy of file descriptor 0 of the Pharo VM process:"
    Stdio stdin.
    "FD 1 and 2, a copy of outputStream's file descriptor:"
    outputStream.
    outputStream.
    "FD 3, not opened:"
    nil.
    "FD 4, a copy of otherStream’s file descriptor:"
    otherStream.
    "FD 5 and above, not opened:"
    }

akgrant43 commented 1 year ago

In http://gtoolkit.com we've added Socket>>setCloseOnExec: which does just what you suggest above.

Perhaps this should be added to Pharo :-).

gtoolkit.com
Home
Glamorous Toolkit is the moldable development environment.

Rinzwind commented 1 year ago

Ah yes, would be good to add that to Pharo: I see the implementation is in ‘GToolkit-Utility-File’, as an extension on Socket which delegates to a corresponding method on SqSocket.

As for the alternative I had in mind of explicitly mapping the file descriptors, I’ve noted there’s a method #closeAllButStandardFileStreams in OSSubprocess though it’s not actually implemented. OSSubprocess makes use of posix_spawn; the functions for manipulating the file_actions unfortunately don’t seem to provide a way to close a range of file descriptors. The rationale for close in POSIX.1-2017 also says “The standard developers rejected a proposal to add closefrom() to the standard […] it is not possible to standardize an interface that closes arbitrary file descriptors above a certain value […].”

daniels220 commented 10 months ago

I would strongly support integrating this in base Pharo, and in fact making it the default for new FDs (file and socket both), requiring a setCloseOnExec: false[^2] to get the old behavior. I suggest this because there is a potential race condition when one Process opens a file/socket, then sets FD_CLOEXEC, while another forks a child process. Presumably if the one doing the fork wants the FD inherited, there would be no race condition with them clearing FD_CLOEXEC before forking[^1]. So rather than add an extra argument to a huge number of methods—consider all the implementors of [binary]writeStream[Encoded:][do:]!—it seems better to apply a safe default, then have a single point of entry to change it back.

I see in the code that GT is using a custom primitive for fcntl(), noting that FFI doesn't support varargs functions on ARM64 (in fact, if my research is correct, it's specifically ARM64/macOS that is different). For anyone wishing to use this code in their own application without loading, presumably, a binary plugin, it appears that on x86_64, a varargs function is called just like a fixed-args function declared with that exact pattern of args—that is, calling fcntl(my_fd,F_SETFD,existing_flags|FD_CLOEXEC) works exactly as if fcntl() was declared as int fcntl(int fd, int cmd, int arg), even though in fact the declaration is int fcntl(int fd, int cmd, ... /* arg */ ). So we can declare the FFI method this way, bypassing the lack of explicit varargs support:

fcntl: fileDescriptor _: cmd _: arg
    "Perform the specified fcntl command."

    ^ self ffiCall: #( int fcntl #( int fileDescriptor , int cmd , int arg ) )

And the code will then work on most platforms without the plugin. This is taking advantage of undefined behavior, so the usual caveats about nasal demons apply, but it's a workable stopgap solution.

Ultimately, given the architecture of the Pharo file/socket plugins, I imagine the solution would be a primitive as part of those plugins that performs an fcntl() on the underlying FD without requiring the image code to traverse the private structs involved in a file handle—which would be good, as that is an obvious hack in its own right.

I could possibly even write those primitives—I've never written C, but I can read it well enough, and most of it would be boilerplate that can be copied from existing primitives—but I'd definitely want someone to look them over, and I can't actually compile the VM myself to test them (long story).

[^1]: Potentially there could be a race if multiple Processes want to fork a child OS-process, each of which inherits only some particular FD, but that would be the case in C code as well as the flag is effectively "global", it lives on the FD rather than being part of the execve() call, and the solution is the same—the "clear FD_CLOEXEC, fork child, re-set FD_CLOEXEC" sequence needs to be a mutual critical section, guarded by a Semaphore or similar. [^2]: Though I might name it just #closeOnExec[:] or #isCloseOnExec[:] rather than a get/set pair.

Ducasse commented 10 months ago

Thanks Daniel. I tagged it so that Pablo can have a look. Now if people would play the game and contribute back to Pharo by sending PRs this would help us. But we will do it by ourselves. I imagine that we will discuss this is in January after holidays.

Ducasse commented 10 months ago

Thanks @Rinzwind for the extension link :).

daniels220 commented 10 months ago

Like I said, I'd be open to collaborating on this—certainly I could fill in the image side of things, referring to a primitive that doesn't yet exist but has an obvious implementation, if someone with VM experience can do that half. I could probably struggle through the VM side, too (I see I can download artifacts from a CI build, so the fact that I can't build my own VM wouldn't have to stop me entirely...), but if someone with more VM experience has time, I'd rather work on other things in the image...

Ducasse commented 10 months ago

I know. My remark was more general. Our VM guys will have a look.

noha commented 10 months ago

My recommendation would be to stay close to POSIX on these things. It is normal that all fds are available in a child process. This is needed because a lot of things rely on processes share stdin/stdout/stderr and for other things. Putting a half baked idea in pharo might not be a good idea.

@daniels220 If it is about the race condition with FD_CLOEXEC there exist O_CLOEXEC and SOCK_CLOEXEC on creation time to exactly remove that problem. So if there is an FFI problem with setting these flags we should fix this

So I'm against changing a default behaviour to somehting new that just opens a lot of other problems. Parent processes can explicitly set FD_CLOEXEC before forking or child processes can clear them after the exec(). There is no such thing as a better default. Especially not if we depart from POSIX

Ducasse commented 10 months ago

Thanks Norbert!

daniels220 commented 10 months ago

Norbert, you have a point...I know I can perhaps be overeager in my opinions when I don't have experience of the full context (in this case, I haven't written C or similar, or dealt directly with POSIX APIs). Certainly it's fair to do some more research...but when I did a quick Google of "is it best practice to always use O_CLOEXEC", the top few results were mostly "yes," either "...unless you specifically know you need to share that FD," or "and if you do need to share one, clear FD_CLOEXEC immediately before fork and re-set it afterward, to avoid accidentally sharing it with a different child process". So this seems like a case where POSIX is held back by the need for backwards-compatibility, creating a footgun for new C programmers for the rest of time, while for Pharo, I think the number of programs that actually rely on sharing FDs (other than stdio) with child processes is...probably either zero or one, honestly. I would note also that you can't actually retrieve the FD of an open file without private-struct hijinks, so how would you even tell the child process what to use? I imagine it can examine the FD table itself, but that seems weird and error-prone.

Another way of looking at it: Python, Ruby, Java, and even Rust all use O_CLOEXEC on all files, with no way I saw at a glance to not do this. They expose fcntl(), allowing you to clear the flag later, but don't bother wrapping it in any high-level API. So that's pretty strong evidence too that the programming community at large considers it a "should have been the default all along" sort of thing.

One definite point of clarification—I forgot to mention stdio—those I would agree should remain as they are even if the default for files and sockets were to change. The existing OSSubprocess library already has the ability to control if/how they are inherited in a thread-safe way, in any case.

Regarding optional use of O_CLOEXEC and SOCK_CLOEXEC—the problem isn't just adding support to the FilePlugin and SocketPlugin primitives, it's with how much surface area in the in-image API would be affected. If I execute 'foo.txt' asFileReference writeStream, at one point the stack looks like this:

FileHandle(FileSystemHandle)>>setReference:writable:
FileHandle class(FileSystemHandle class)>>on:writable:
FileHandle class(FileSystemHandle class)>>open:writable:
FileSystem>>open:writable:
FileSystem>>binaryWriteStreamOn:
FileReference>>binaryWriteStream
FileReference(AbstractFileReference)>>writeStreamEncoded:
FileReference>>writeStream
UndefinedObject>>DoIt

The subsequent #open in FileSystem>>open:writable: is a no-op with a TODO flag, and the actual opening happens with the stack looking like:

File class>>open:writable: (primitive)
[] in File>>basicOpenForWrite:
File class>>retryWithGC:until:forFileNamed:
File>>basicOpenForWrite:
File>>openForWrite:
File>>openForWrite
File>>writeStream
FileHandle>>binaryWriteStream
FileSystem>>binaryWriteStreamOn:
...as above...

So if we were to...okay, let's keep things simple and say we just add an argument specifically for this and make the primitive open:writable:closeOnExec:. (This feels dirty to me, but there don't seem to be many other similar flags, so maybe it's not so unreasonable?) Now we have 13 other methods that need a variant with a parameter in order to pass the argument down the chain. And there are at least 2 more methods on the "write" side, and something like 8 more on the read side, plus oddballs like appendStream, that aren't part of this call chain but would need the same treatment. And what does the high-level API even look like? #writeStreamWithCloseOnExec? Not exactly pretty...

Perhaps a more builder-styled API would be better—add a property to FileReference so that it looks like ('foo.txt' asFileReference) useCloseOnExec; writeStream. Then we could continue the pattern and add a similar property to FileReference and File. This is probably less added code overall, but it's semantically questionable—is it really a property of the file reference that "oh, hey, if we open this file we should do so with O_CLOEXEC"? What happens when you combine relative paths—does the new FileReference inherit the flag from the one it was constructed from? Etc. etc. So it's a tradeoff and not clearly better.

And none of this does anything for applications that make use of library code that opens its own files or sockets, without O_CLOEXEC, and elsewhere in the application happens to fork child processes. So add on top of that needing to thread the argument through and/or add option properties to all the various XML, JSON, CSV, etc. parsing libraries that have their own wrappers to open a file and transparently translate the markup language.

Ducasse commented 10 months ago

Thanks for the discussion! This is important to have multiple views. I like the fact that other languages are doing this.

daniels220 commented 7 months ago

Clearly nothing is going to happen with this for Pharo 12...could someone move it to the Pharo 13 milestone and maybe we can revisit in a couple months?

jecisc commented 7 months ago

Done

pharo-project / pharo

File descriptors get passed down to ‘system’ child processes #12576