Rubish is a shell in Ruby. It is object oriented, and it only uses Ruby's own syntax (no metasyntax of its own). Rubish is pronounced Roobish, as opposed to Rubbish, unlike Bash.
Fire up an irb, and start Rubish
$ irb -rrubish
irb> Rubish.repl
rbh> date
Tue Mar 17 17:06:04 PDT 2009
rbh> uname :svrm
Linux 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686
A few tips upfront. Sometimes Rubish could mess up
your terminal. If that happens, try hitting C-c
and C-d
to get back to Bash, then,
$ reset
to reset your terminal. Also, Rubish doesn't have
shell history. But it uses the readline library,
so you can use its history mechanism. C-r <string>
to match a previously entered line with
string.
Rubish's Executable class provides a common API for IO redirection and output processing. The subclasses are,
Command
A unix command.
Pipe
A pipe line of unix commands
Awk
Sed
Batch
An arbitrary block of code. Like subshell.
Rubish REPL takes a line and instance_eval
it
with the shell object (a Rubish::Context
). If
the method is undefined, they call is translated
into an Executable (Rubish::Command
) object with
method_missing
.
rbh> ls
awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
# ls evaluates to a Rubish::Command object, which
# is a subclass of Rubish::Executable
rbh> ls.inspect
"#<Rubish::Command::ShellCommand:0xb7ac297c @args=\"\", @status=nil, @cmd=\"ls \">"
# you can store a command in an instance variable
rbh> @cmd = ls; nil
nil
# if the shell evaluates to a command, the shell
# calls the exec method on it.
rbh> @cmd # same as @cmd.exec
awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
rbh> @cmd
awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
You can invoke a command with arguments of
String
, Symbol
, or Array
(of String
,
Symbol
, or Array
(recursively)). A String
argument is taken as it is. A Symbol
is
translated to a flag (:flag => -flag). Arguments
in an Array
are treated likewise. Finally, all
the arguments are flatten and joined together.
The followings are equivalent,
rbh> ls :l, "awk.rb", "sed.rb"
rbh> ls "-l awk.rb sed.rb"
rbh> ls :l, %w(awk.rb sed.rb)
rbh> p { ls ; tr "a-z A-Z" }
AWK.RB
COMMAND_BUILDER.RB
COMMAND.RB
EXECUTABLE.RB
LICENSE
PIPE.RB
README.TEXTILE
RUBISH.RB
SED.RB
SESSION.RB
STREAMER.RB
Pipes are first class values:
rbh> @pipe = p { ls ; tr "a-z A-Z" }; nil
# again, we return nil so @pipe doesn't get executed.
rbh> @pipe
# execute @pipe once
rbh> @pipe
# execute @pipe again
IO redirections are done by methods defined in
Rubish::Executable
.
Rubish::Executable#i(io=nil)
Set the $stdin of the executable when
it is executed. If called without an argument,
return the executable's IO object.
Rubish::Executable#o(io=nil)
Ditto for $stdout
Rubish::Executable#err(io=nil)
Ditto for $stderr
rbh> ls.o("ls-result")
rbh> cat.i("ls-result")
awk.rb
command_builder.rb
command.rb
executable.rb
LICENSE
ls-result
pipe.rb
README.textile
rubish.rb
sed.rb
session.rb
streamer.rb
Rubish can take 4 kinds of objects for
IO. String
(used as a file path), Integer
(used as file descriptor), IO
object, or a ruby
block. Using the a block for IO, the block
receives a pipe connecting it to the command, for
reading or writing.
# pump numbers into cat
rbh> cat.i { |p| p.puts((1..5).to_a) }
1
2
3
4
5
# upcase all filenames
rbh> ls.o { |p| p.each_line {|l| puts l.upcase} }
AWK.RB
COMMAND_BUILDER.RB
COMMAND.RB
EXECUTABLE.RB
LICENSE
LS-RESULT
PIPE.RB
README.TEXTILE
RUBISH.RB
SED.RB
SESSION.RB
STREAMER.RB
# kinda funny, pump numbers into cat, then pull
# them out again.
rbh> cat.i { |p| p.puts((1..10).to_a) }.o {|p| p.each_line {|l| puts l.to_i+100 }}
101
102
103
104
105
106
107
108
109
110
The input and output blocks are executed in their own threads. So careful.
Rubish is designed so it's easy to interface Unix command with Ruby.
Rubish::Executable#each(&block)
yield each line of output to a block.
Rubish::Executable#map(&block)
Like #each, but collect the values returned by
the block. If no block given, collect each
line of the output.
Rubish::Executable#head(n=1,&block)
Process the first n lines of output
with a block.
Rubish::Executable#tail(n=1,&block)
Process the last n lines of output
with a block.
Rubish::Executable#first
Returns first line of output.
Rubish::Executable#last
Returns last line of output.
Since this is Ruby, there's no crazy metasyntatic issues when you want to process the output lines.
# print filename and its extension side by side.
rbh> ls.each { |f| puts "#{f}\t#{File.extname(f)}" }
address.rb .rb
awk.output .output
awk.rb .rb
command_builder.rb .rb
command.rb .rb
executable.rb .rb
foo
foobar
foo.bar .bar
foo.rb .rb
LICENSE
my.rb .rb
pipe.rb .rb
#README.textile# .textile#
README.textile .textile
rubish.rb .rb
ruby-termios-0.9.5 .5
ruby-termios-0.9.5.tar.gz .gz
#sed.rb# .rb#
sed.rb .rb
session.rb .rb
streamer.rb .rb
todo
util
You can execute a command within the each block.
rbh> ls.each { |f| wc(f).exec }
64 131 1013 awk.rb
116 202 1914 command_builder.rb
56 113 1034 command.rb
196 563 4388 executable.rb
24 217 1469 LICENSE
12 12 132 ls-result
78 245 1917 pipe.rb
142 544 3388 README.textile
107 278 2340 rubish.rb
46 54 546 sed.rb
95 206 1870 session.rb
264 708 5906 streamer.rb
One nifty thing to do is to collect the outputs of nested commands.
rbh> ls.map {|f| stat(f).map }
[[" File: `awk.rb'",
" Size: 1013 \tBlocks: 8 IO Block: 4096 regular file",
"Device: 801h/2049d\tInode: 984369 Links: 1",
"Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
"Access: 2009-03-17 21:02:25.000000000 -0700",
"Modify: 2009-03-17 21:02:13.000000000 -0700",
"Change: 2009-03-17 21:02:13.000000000 -0700"],
[" File: `command_builder.rb'",
" Size: 1914 \tBlocks: 8 IO Block: 4096 regular file",
"Device: 801h/2049d\tInode: 984371 Links: 1",
"Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
"Access: 2009-03-17 21:02:25.000000000 -0700",
"Modify: 2009-03-17 21:02:13.000000000 -0700",
"Change: 2009-03-17 21:02:13.000000000 -0700"],
...
]
All the above apply to pipes as well. We can find out how many files are in a directory as a Ruby Integer.
rbh> p { ls; wc}
23 23 248
rbh> p { ls; wc}.map
[" 23 23 248\n"]
rbh> p { ls; wc}.map.first.split
["23", "23", "248"]
rbh> p { ls; wc}.map.first.split.first.to_i
23
An big problem with Bash is when you have to process output with weird characters. Ideally, you might want to say,
wc `ls`
But that breaks. You have to say,
find . -maxdepth 1 -print0 | xargs -0 wc
And then again, that only works if you are working with files, and if the command (e.g. wc) accepts multiple arguments. In Rubish, you can use the Executable#q method to tell a command to quote its arguments. Like so,
wc(ls.map).q
Rubish has sedish and awkish things that are not quite like sed and awk, but not entirely unlike sed and awk.
Rubish::Sed
doesn't implicitly print (unlike
real sed). There's actually no option to turn on
implicit printing.
Rubish::Sed#line
the current line sed is processing
Rubish::Sed#p(*args)
print current line if no argument is given.
Rubish::Sed#s(regexp,str)
String#sub! on the current line
Rubish::Sed#gs(regexp,str)
String#gsub! on the current line
Rubish::Sed#q
quit from sed processing.
rbh> ls.sed { gs /b/, "bee"; p if line =~ /.rbee$/ }
awk.rbee
command_beeuilder.rbee
command.rbee
executabeele.rbee
pipe.rbee
rubeeish.rbee
sed.rbee
session.rbee
streamer.rbee
# output to a file
rbh> ls.sed { p }.o "sed.result"
Rubish::Sed doesn't have the concepts of swapping,
appending, modifying, or any interaction between
pattern space and hold space. Good riddance. The
block is instance_eval
by the Sed object, so you
can keep track of state using instance variables.
Awk is a lot like sed. But you can associate actions to be done before or after awk processing.
Rubish::Awk#begin(&block)
block is instance_eval by the Awk object
before processing.
Rubish::Awk#act(&block)
blcok is instance_eval by the Awk object for
each line.
Rubish::Awk#end(&block)
block is instance_eval at the end of
processing. Its value is returned as the
result.
rbh> ls.awk { puts do_something(line)}
# you can have begin and end blocks for awk.
rbh> ls.awk.begin { ...init }.act { ...body}.end { ...final}
You can associate multiple blocks with either awk or sed. Each block is an "action" that's processed in left-to-right order.
rbh> cmd.sed.act { ... }.act { ... }
rbh> cmd.awk.act { ... }.act { ... }
Rubish supports awk/sed-style pattern matching.
.sed(/a/) # triggers for lines that matches
.sed(/a/,/b/) # triggers for lines between (inclusive)
.sed(1) # matches line one
.sed(3,:eof) # matches line 3 to end of stream
# ditto with awk
.awk(/a/)
> cat.i {|p| p.puts((1..10).to_a)}.sed(2,4) { p }
2
3
4
Rubish::{Sed,Awk}
actually share the
Rubish::Streamer
mixin. Most of their mechanisms
are implemented by this mixin. It has two
interesting features:
Let's see line buffering first.
Rubish::Streamer#peek(n=1)
Return the next n lines (as Array of Strings),
and put these lines in the stream buffer.
Rubish::Streamer#skip(n=1)
Skip the next n lines.
Rubish::Streamer#stop(n=1)
Skip other actions in the streamer, and
process next line.
Rubish::Streamer#quit(n=1)
Quit the streaming process.
By the way, isn't it nice that these methods all have four chars?
# print files in groups of 3, separated by blank lines.
rbh> ls.sed { p; puts peek(2); puts ""; skip(3) }
In general, the aggregating methods take a name, a value, and an optional key. The aggregated result is accumulated in an instance variable named by the given name. Each aggregator type basically does foldl on an initial value. The optional key is used to partition an aggregation.
Rubish::Streamer#count(name,key=nil)
count number of times it's called.
Rubish::Streamer#max(name,val,key=nil)
Rubish::Streamer#min(name,val,key=nil)
Rubish::Streamer#sum(name,val,key=nil)
Rubish::Streamer#collect(name,val,key=nil)
collect vals into an array.
Rubish::Streamer#hold(name,size,val,key=nil)
collect vals into a fixed-size FIFO queue.
Rubish::Streamer#pick(name,val,key=nil,&block)
pass the block old_val and new_val, and
the value returned by block is saved in
"name".
Each aggregator's name is used to create a bucket. A reader method named by name can be used to access that bucket. A bucket is a hash of partitioned accumulation keyed by key. The special key nil aggregates over the entire domain (like MySQL's rollup).
# find the length of the longest file name, and
# collect the file names.
ls.awk { f=a[0]; max(:fl,f.length,File.extname(f)); collect(:fn,f)}.end { pp buckets; [fl,fl(""),fn] }
{:fl=>{""=>10, nil=>18, ".textile"=>14, ".rb"=>18},
:fn=>
{nil=>
["awk.rb",
"command_builder.rb",
"command.rb",
"executable.rb",
"LICENSE",
"ls-result",
"pipe.rb",
"README.textile",
"rubish.rb",
"sed.rb",
"sed-result",
"session.rb",
"streamer.rb"]}}
[18,
10,
["awk.rb",
"command_builder.rb",
"command.rb",
"executable.rb",
"LICENSE",
"ls-result",
"pipe.rb",
"README.textile",
"rubish.rb",
"sed.rb",
"sed-result",
"session.rb",
"streamer.rb"]]
The first printout of hash is from pp buckets
.
You can see the aggregation partitioned
by file extensions (in the case of fl
). Note
that fl(nil)
holds the max length over all the
files (the entire domain).
All Executable and its subclasses can execute in the background.
Executable::exec!
Execute, immediately return a Job
Executable::each!(&block)
Iterate the output in the background.
Executable::map!(acc,&block)
Accumulate output into a thread-safe
datastructure with <<
A Job has the following methods,
Job#wait
Wait for the job to finish. Would block
the current thread. Raises if computation
ends abnormally.
Job#stop
Signal for the job to terminate, then wait
for it.
In the case of executing a unix command (or pipe),
Job#wait
would wait for the child process to
finish. Job#stop
would send SIGTERM to the
process, then wait.
# slowcat takes 3 seconds to complete
> @j = slowcat(3).exec!
# return immediately
> @j
#<Job>
> @j.wait # blocks for three seconds
Jobs are registered in a JobControl object,
JobControl#wait(*jobs)
wait for jobs to complete, then unregister them.
JobControl#waitall
wait for all jobs to complete
JobControl#jobs
all the registered (and active) jobs.
> job_control
#<Rubish::JobControl>
> wait(@job)
# == job_control.wait(@job)
> waitall # == job_control.wait
Rubish gives you fine control over the execution context of Executables.
First, contextual IOs
with {
cmd1.exec
cmd2.exec
with { cmd3.exec }.o("output-3")
}.o("output-1-and-2")
At the shell, a Workspace
object contains all
the visible method bindings you can use (as well
as methods from Kernel). Everything else
translates to a Rubish::Command
instance by
Workspace#method_missing
. To extend a workspace,
just mix in modules.
However, it's usually not a good idea to include a module into the Workspace class, since this extension would be visible in all the Workspace instances, thus risking incompatibilities among different extensions. It's better to extend workspace singletons. The philosophy is, a workspace is your own. You have the freedom to mess it up however you like for your personal conveniences. But the messing-it-up should be localized.
Workspace#derive(*modules)
clone the current workspace, then extend the clone with modules.
Ruby doesn't have lexical scoping for methods, but you can fake it by creating modules and deriving workspaces on the fly.
with(derive({def foo; ...; end})) {
... # outer foo
with(derive({def foo; ...; end})) {
... # inner foo
}
... # outer foo
}
The definition block for derive
is used by
Module.new(&block)
to create a dynamic module
that's mixed into a derived Workspace.
A batch executable is a block of code executed within a context in a thread. This gives you coarse-grained structured concurrency. It's like subshell, but within the same process, and offers finer control over IO and namespace (i.e. visible bindings).
Schematically, a batch job is like,
@job = Thread.new { context.eval { work }}
@job.wait
An example,
@b = batch {
exec! cmd1, cmd2
batch {
exec! cmd3, cmd4
batch { exec! cmd5 }
}.exec
}
@b.exec! # => a job
Batches are nestable, such that each batch has its own job control. A batch finishes when all its jobs are terminated, as well as the jobs of all nested job_controls.
Using batches, cocurrent jobs can be organized structurally into a tree.
A batch is just a wrapper over context, you can specify the execution context of a batch,
# extend the batch context
batch(derive(mod1,mod2)) { ... }
And a batch is an Executable! So all the Executable methods are applicable:
batch { ... }.map { |l| ... }
batch { ... }.tail
It's fun to think about.
Happy Hacking!
Created by Howard Yeh.
Gem made available by Gabriel Horner.