Cuneiform is a large-scale data analysis functional programming language. It is open because it easily integrates foreign tools and libraries, e.g., Python libraries or command line tools. It is general because it has the expressive power of a functional programming language while using the independence of sub-expressions to automatically parallelize programs. Cuneiform uses distributed Erlang to scalably run in cluster and cloud environments.
Having rebar3 available on your system, compile the project by entering
rebar3 escriptize
Compiling the Cuneiform client using rebar3 escriptize
creates an Erlang script file _build/default/bin/cfl
which allows starting Cuneiform via the command line.
To display a help text enter
_build/default/bin/cfl --help
This will show the command line synopsis, which looks like the following:
Usage: cfl [-v] [-h] [-n <n_wrk>] [-w <wrk_dir>] [-r <repo_dir>]
[-d <data_dir>]
-v, --version Show cf_worker version.
-h, --help Show command line options.
-n, --n_wrk Number of worker processes to start. 0 means auto-detect
available processors.
-w, --wrk_dir Working directory in which workers store temporary files.
-r, --repo_dir Repository directory for intermediate and output data.
-d, --data_dir Data directory where input data is located.
This script is self-contained and can be moved around to, e.g., ~/bin/cfl
. From here on we assume that the cfl
script is accessible in your system path and that we can start it by just entering cfl
instead of _build/default/bin/cfl
.
You can start a shell and program Cuneiform interactively by starting it without any command line parameters like so:
cfl
This will open a shell giving the following initial output, along with a number of status messages:
@@WB Cuneiform
@@E_____
_g@@@@@WWWWWWL Type help for usage info
g@@#*`3@B quit to exit shell
@@P 3@B
@N____ 3@B http://www.cuneiform-lang.org
"W@@@WF3@B Jorgen Brandt
1>
Note that starting Cuneiform like that will create a local instance of a Cuneiform that entails the scheduler service, a client service, and as many worker services as CPUs were detected on the host machine. To set up a distributed Cuneiform system, these services need to be started separately on multiple hosts as needed.
Alternatively, Cuneiform can be started by giving it a source file which will only output the final result of the computation. If your Cuneiform script is stored in my_script.cfl
start it by entering
cfl my_script.cfl
A collection of self-contained Cuneiform examples is available under joergen7/cuneiform-examples.
You can assign a value to a variable and retrieve a variable's content like so:
let x : Str =
"foo";
x;
In the first line we assign the value "foo"
to a variable named x
declaring its type to be Str
. In the last line we query the variable x
.
We can branch execution based on conditions using conditional statements. Conditionals are expressions.
let x : Str =
if true
then
"bla"
else
"blub"
end;
x;
The above command the conditional binding the string "bla"
to the variable x
. Then, we query the variable.
We can construct list literals by enumerating their elements in square brackets and declaring the type of the list elements.
let xs : [Bool] =
[true, false, true, true : Bool];
xs;
Here, we define the list xs
whose elements are of type Bool
giving four Boolean values of which only the second is false
.
A record is a collection of fields that can be accessed via their labels. Literal records can be constructed like so:
let r : <a : Str, b : Bool> =
<a = "blub", b = false>;
( r|a );
We define a record r
with two fields a
and b
, of types Str
and Bool
respectively. The field associated with a
gets the value "blub"
while the field associated with b
gets the value false
. In the last line we access the a
field of the record r
.
Alternatively, we can access record fields via pattern matching:
let <a = z : Str> = r;
z;
In the first line we associate the variable z
with the field a
of record r
. In the second line we query the content of z
.
Defining native functions in Cuneiform is done by giving the function name, its signature, and a body expression in curly braces:
def identity( x : Str ) -> Str {
x
}
identity( x = "bar" );
In the first line we define the function identity
which consumes an argument x
of type Str
and produces a return value of type Str
. In the second line, the body expression is just the argument x
. In the last line we call the function binding the argument x
to the value "bar"
.
Defining foreign functions is done by giving the function name, its signature, the foreign language name, and the function body in mickey-mouse-eared curly braces.
def greet( person : Str ) -> <out : Str> in Bash *{
out="Hello $person"
}*
greet( person = "Peter" );
The first line defines a foreign function greet
taking one argument person
of type Str
and returning a tuple with a single field out
of type Str
. The foreign function body is given in Bash code. In the last line we call the foreign function, binding the argument person
to the string value "Peter"
.
To perform an operation on each element of a list, one can iterate using for:
let xs : [Bool] =
[true, false, true, true : Bool];
for x : Bool <- xs do
not x
: Bool
end;
Here, we define a list of four Booleans and negate each element.
We can aggregate over lists using fold:
def add( a : Str, b : Str ) -> <c : Str> in Python *{
c = int( a )+int( b )
}*
let xs : [Str] = [1, 2, 3 : Str];
let sum : Str =
fold acc : Str = 0, x : Str <- xs do
( add( a = acc, b = x )|c )
end;
sum;
Here, we first define the function add
which lets us add two numbers in Python and then the string list xs
containing the numbers from one to three. We aggregate the sum of the numbers in xs
and store it the result in the variable sum
. Lastly, we query the sum
variable.