The problem is the OS limits the size of a pipe buffer and jsh doesn't start a built_in-producer in parallel with its consumer-follower in the pipeline (since jsh is mono-threaded).
context: jsh pipeline implementation
To clarify: let's take an easy non-built_in example cat some_file | grep some_string. This is more or less what happens:
jsh parses the string and construct the pipeline of comd's in its data structures
jsh then opens the necessary pipe buffers using libc functions (OS system calls)
jsh then starts a loop where it forks all the commands in the pipeline on after the other, redirecting stdin and stdout to the correct pipe file descriptors just before calling execv to replace the forked jsh child with the correct UNIX command (e.g. cat or grep)
After this loop, jsh waits for the completion of all children
The point is jsh doesn't block after one process in the pipeline is started (forked). Instead it directly creates the next one. This implies the consumer process (eg grep) can start consuming data from the pipe filled by the producer (eg cat). The OS limits the size of a pipe buffer, refusing any more data when the pipe buffer is full. The rationale here is that the consumer should consume some data, freeing up space, before the producer can continue producing.
problem: built_ins in pipelines
Now the problem: jsh deals with built_ins in a pipeline different then with normal "forkable" commands. For example consider history | grep some_string
jsh starts the actual pipe processes in the fork loop above; when it detects one of the processes is a built_in, it first redirect stdin and stdout to the pipes and then calls the parse_built_in() function, blocking the loop till completion...
Of course this is the problem: a built_in will write all of its data to the pipe before the consumer process is started (since jsh executes in a single thread). When the history file is very large, the pipe buffer fills up... (e.g. + 60.000 bytes of history; +5000 lines)
possible solutions
Some solution sketches:
a straightforward and very bad solution I think: make the program execute a built_in in a separate thread --> no-go
hackhackhack: replace the history built_in with an external process that reads ~/.jsh_history e.g. simple cat --> this fixes the problem, by hacking around it; no real general solution
the only good thing to do: find a way to execute a built_in and continue the loop
e.g. by first executing the entire loop and remembering to execute built_ins after all processes are forked? I think this should work since the forked processes wait for an EOF on stdout anyway?
Comments? Feedback? Ideas?
If anyone feels like it, claim! This is a nice bugfix for getting to know the jhs internals :-)
This is a nasty bug that took me some time ;-)
The problem is the OS limits the size of a pipe buffer and
jsh
doesn't start a built_in-producer in parallel with its consumer-follower in the pipeline (sincejsh
is mono-threaded).context: jsh pipeline implementation
To clarify: let's take an easy non-built_in example
cat some_file | grep some_string
. This is more or less what happens:jsh
parses the string and construct the pipeline ofcomd
's in its data structuresjsh
then opens the necessary pipe buffers usinglibc
functions (OS system calls)jsh
then starts a loop where it forks all the commands in the pipeline on after the other, redirecting stdin and stdout to the correct pipe file descriptors just before callingexecv
to replace the forkedjsh
child with the correct UNIX command (e.g.cat
orgrep
)jsh
waits for the completion of all childrenThe point is
jsh
doesn't block after one process in the pipeline is started (forked). Instead it directly creates the next one. This implies the consumer process (eg grep) can start consuming data from the pipe filled by the producer (eg cat). The OS limits the size of a pipe buffer, refusing any more data when the pipe buffer is full. The rationale here is that the consumer should consume some data, freeing up space, before the producer can continue producing.problem: built_ins in pipelines
Now the problem:
jsh
deals with built_ins in a pipeline different then with normal "forkable" commands. For example considerhistory | grep some_string
jsh
starts the actual pipe processes in the fork loop above; when it detects one of the processes is a built_in, it first redirect stdin and stdout to the pipes and then calls theparse_built_in()
function, blocking the loop till completion...Of course this is the problem: a built_in will write all of its data to the pipe before the consumer process is started (since
jsh
executes in a single thread). When the history file is very large, the pipe buffer fills up... (e.g. + 60.000 bytes of history; +5000 lines)possible solutions
Some solution sketches:
~/.jsh_history
e.g. simplecat
--> this fixes the problem, by hacking around it; no real general solutionComments? Feedback? Ideas?
If anyone feels like it, claim! This is a nice bugfix for getting to know the
jhs
internals :-)