joergen7 / cuneiform

Cuneiform distributed programming language
https://cuneiform-lang.org/
Apache License 2.0
232 stars 16 forks source link
bioinformatics distributed-computing erlang functional-programming machine-learning otp parallelization workflow workflow-engine

Cuneiform: Data analysis open and general

hex.pm Build Status

Cuneiform is a large-scale data analysis functional programming language. It is open because it easily integrates foreign tools and libraries, e.g., Python libraries or command line tools. It is general because it has the expressive power of a functional programming language while using the independence of sub-expressions to automatically parallelize programs. Cuneiform uses distributed Erlang to scalably run in cluster and cloud environments.

cuneiform-lang.org

Usage

Compiling

Having rebar3 available on your system, compile the project by entering

rebar3 escriptize

Displaying Cuneiform Help

Compiling the Cuneiform client using rebar3 escriptize creates an Erlang script file _build/default/bin/cfl which allows starting Cuneiform via the command line.

To display a help text enter

_build/default/bin/cfl --help

This will show the command line synopsis, which looks like the following:

Usage: cfl [-v] [-h] [-n <n_wrk>] [-w <wrk_dir>] [-r <repo_dir>]
           [-d <data_dir>]

  -v, --version   Show cf_worker version.
  -h, --help      Show command line options.
  -n, --n_wrk     Number of worker processes to start. 0 means auto-detect 
                  available processors.
  -w, --wrk_dir   Working directory in which workers store temporary files.
  -r, --repo_dir  Repository directory for intermediate and output data.
  -d, --data_dir  Data directory where input data is located.

This script is self-contained and can be moved around to, e.g., ~/bin/cfl. From here on we assume that the cfl script is accessible in your system path and that we can start it by just entering cfl instead of _build/default/bin/cfl.

Starting an Interactive Shell

You can start a shell and program Cuneiform interactively by starting it without any command line parameters like so:

cfl

This will open a shell giving the following initial output, along with a number of status messages:

           @@WB      Cuneiform
          @@E_____
     _g@@@@@WWWWWWL  Type help for usage info
   g@@#*`3@B              quit to exit shell
  @@P    3@B
  @N____ 3@B         http://www.cuneiform-lang.org
  "W@@@WF3@B         Jorgen Brandt

1>

Note that starting Cuneiform like that will create a local instance of a Cuneiform that entails the scheduler service, a client service, and as many worker services as CPUs were detected on the host machine. To set up a distributed Cuneiform system, these services need to be started separately on multiple hosts as needed.

Running a Cuneiform Script

Alternatively, Cuneiform can be started by giving it a source file which will only output the final result of the computation. If your Cuneiform script is stored in my_script.cfl start it by entering

cfl my_script.cfl

Examples

A collection of self-contained Cuneiform examples is available under joergen7/cuneiform-examples.

Variable assignment

You can assign a value to a variable and retrieve a variable's content like so:

let x : Str =
  "foo";

x;

In the first line we assign the value "foo" to a variable named x declaring its type to be Str. In the last line we query the variable x.

Booleans and Conditions

We can branch execution based on conditions using conditional statements. Conditionals are expressions.

let x : Str =
  if true
  then
    "bla"
  else
    "blub"
  end;

x;

The above command the conditional binding the string "bla" to the variable x. Then, we query the variable.

Lists

We can construct list literals by enumerating their elements in square brackets and declaring the type of the list elements.

let xs : [Bool] =
  [true, false, true, true : Bool];

xs;

Here, we define the list xs whose elements are of type Bool giving four Boolean values of which only the second is false.

Records and Pattern Matching

A record is a collection of fields that can be accessed via their labels. Literal records can be constructed like so:

let r : <a : Str, b : Bool> =
  <a = "blub", b = false>;

( r|a );

We define a record r with two fields a and b, of types Str and Bool respectively. The field associated with a gets the value "blub" while the field associated with b gets the value false. In the last line we access the a field of the record r.

Alternatively, we can access record fields via pattern matching:

let <a = z : Str> = r;
z;

In the first line we associate the variable z with the field a of record r. In the second line we query the content of z.

Native Function Definition

Defining native functions in Cuneiform is done by giving the function name, its signature, and a body expression in curly braces:

def identity( x : Str ) -> Str {
  x
}

identity( x = "bar" );

In the first line we define the function identity which consumes an argument x of type Str and produces a return value of type Str. In the second line, the body expression is just the argument x. In the last line we call the function binding the argument x to the value "bar".

Foreign Function Definition

Defining foreign functions is done by giving the function name, its signature, the foreign language name, and the function body in mickey-mouse-eared curly braces.

def greet( person : Str ) -> <out : Str> in Bash *{
  out="Hello $person"
}*

greet( person = "Peter" );

The first line defines a foreign function greet taking one argument person of type Str and returning a tuple with a single field out of type Str. The foreign function body is given in Bash code. In the last line we call the foreign function, binding the argument person to the string value "Peter".

Iterating over Lists using For

To perform an operation on each element of a list, one can iterate using for:

let xs : [Bool] =
  [true, false, true, true : Bool];

for x : Bool <- xs do
  not x
  : Bool
end;

Here, we define a list of four Booleans and negate each element.

Aggregating Lists using Fold

We can aggregate over lists using fold:

def add( a : Str, b : Str ) -> <c : Str> in Python *{
  c = int( a )+int( b )
}*

let xs : [Str] = [1, 2, 3 : Str];

let sum : Str =
  fold acc : Str = 0, x : Str <- xs do
    ( add( a = acc, b = x )|c )
  end;

sum;

Here, we first define the function add which lets us add two numbers in Python and then the string list xs containing the numbers from one to three. We aggregate the sum of the numbers in xs and store it the result in the variable sum. Lastly, we query the sum variable.

System Requirements

Resources

License

Apache 2.0