lihongjie0209 / myblog

4 stars 0 forks source link

Linux: Process substitution #116

Open lihongjie0209 opened 3 years ago

lihongjie0209 commented 3 years ago

In computing, process substitution is a form of inter-process communication that allows the input or output of a command to appear as a file. The command is substituted in-line, where a file name would normally occur, by the command shell. This allows programs that normally only accept files to directly read from or write to another program.

lihongjie0209 commented 3 years ago

The following examples use Korn shell syntax.

The Unix diff command normally accepts the names of two files to compare, or one file name and standard input. Process substitution allows one to compare the output of two programs directly:

$ diff <(sort file1) <(sort file2)

The <(command) expression tells the command interpreter to run command and make its output appear as a file. The command can be any arbitrarily complex shell command.

Without process substitution, the alternatives are:

  1. Save the output of the command(s) to a temporary file, then read the temporary file(s).

    $ sort file2 > /tmp/file2.sorted
    $ sort file1 | diff - /tmp/file2.sorted
    $ rm /tmp/file2.sorted
  2. Create a named pipe (also known as a FIFO), start one command writing to the named pipe in the background, then run the other command with the named pipe as input.

    $ mkfifo /tmp/sort2.fifo
    $ sort file2 > /tmp/sort2.fifo &
    $ sort file1 | diff - /tmp/sort2.fifo
    $ rm /tmp/sort2.fifo

Both alternatives are more cumbersome.

Process substitution can also be used to capture output that would normally go to a file, and redirect it to the input of a process. The Bash syntax for writing to a process is >(command). Here is an example using the [tee]( "Tee (command)"), [wc]( "Wc (Unix)") and [gzip]( "Gzip") commands that counts the lines in a file with wc -l and compresses it with gzip in one pass:

$ tee >(wc -l >&2) < bigfile | gzip > bigfile.gz
lihongjie0209 commented 3 years ago


Under the hood, process substitution has two implementations. On systems which support /dev/fd (most Unix-like systems) it works by calling the pipe() system call, which returns a file descriptor $fd for a new anonymous pipe, then creating the string /dev/fd/$fd, and substitutes that on the command line. On systems without /dev/fd support, it calls mkfifo with a new temporary filename to create a named pipe, and substitutes this filename on the command line. To illustrate the steps involved, consider the following simple command substitution on a system with /dev/fd support:

$ diff file1 <(sort file2)

The steps the shell performs are:

  1. Create a new anonymous pipe. This pipe will be accessible with something like /dev/fd/63; you can see it with a command like echo <(true).
  2. Execute the substituted command in the background (sort file2 in this case), piping its output to the anonymous pipe.
  3. Execute the primary command, replacing the substituted command with the path of the anonymous pipe. In this case, the full command might expand to something like diff file1 /dev/fd/63.
  4. When execution is finished, close the anonymous pipe.

For named pipes, the execution differs solely in the creation and deletion of the pipe; they are created with mkfifo (which is given a new temporary file name) and removed with unlink. All other aspects remain the same.