ta0kira / zeolite

Zeolite is a statically-typed, general-purpose programming language.
Apache License 2.0
18 stars 0 forks source link
compiler freebsd linux macos programming-language zeolite-lang

Zeolite Programming Language

Haskell CI Status Hackage Status

Zeolite is a statically-typed, general-purpose programming language. The type system revolves around defining objects and their usage patterns.

Zeolite prioritizes making it easy to write maintainable and understandable code. This is done by rethinking standard language idioms and limiting flexibility in some places while increasing it in others. In particular, emphasis is placed on the user's experience when troubleshooting code that is incorrect.

The design of the type system and the language itself is influenced by positive and negative experiences with Java, C++, Haskell, Python, Ruby, and Go, with collaborative development, and with various systems of code-quality enforcement.

Due to the way GitHub renders embedded HTML, the colors might not show up in the syntax-highlighted code in this document. If you use the Chrome browser, you can view the intended formatting using the Markdown Viewer extension to view the raw version of this document.

Table of Contents

Project Status

Zeolite is still evolving in all areas (syntax, build system, etc.), and it still lacks a lot of standard library functionality. That said, it was designed with practical applications in mind. It does not prioritize having impressive toy examples (e.g., merge-sort or "Hello World" in one line); the real value is seen in programs with higher relational complexity.

Language Overview

This section discusses some of the features that make Zeolite unique. It does not go into detail about all of the language's features; see Writing Programs and the full examples for more specific language information.

Programming Paradigms

Zeolite currently uses both procedural and object-oriented programming paradigms. It shares many features with Java, but it also has additional features and restrictions meant to simplify code maintenance.

Parameter Variance

The initial motivation for Zeolite was a type system that allows implicit conversions between different parameterizations of parameterized types. A parameterized type is a type with type "place-holders", e.g., templates in C++ and generics in Java.

Java and C++ do not allow you to safely convert between different parameterizations. For example, you cannot safely convert a List<String> into a List<Object> in Java. This is primarily because List uses its type parameter for both input and output.

Zeolite, on the other hand, uses declaration-site variance for each parameter. (C# also does this to a lesser extent.) This allows the language to support very powerful recursive type conversions for parameterized types. Zeolite also allows use-site variance declarations, like Java uses.

Building variance into the core of the type system also allows Zeolite to have a special meta-type that interfaces can use to require that implementations return a value of their own type rather than the type of the interface. This is particularly useful for defining interfaces for iterators and builders, whose methods often perform an update and return a value of the same type.

Parameters as Variables

Zeolite treats type parameters both as type place-holders (like in C++ and Java) and as type variables that you can call functions on. This further allows Zeolite to have interfaces that declare functions that operate on types in addition to interfaces that declare functions that operate on values. (This would be like having abstract static methods in Java.)

This helps solve a few separate problems:

Integrated Test Runner

The major advantage of statically-typed programming languages is their compile-time detection of code that should not be allowed. On the other hand, there is a major testability gap when it comes to ensuring that your statically-typed code disallows what you expect it to.

Zeolite has a special source-file extension for unit tests, and a built-in compiler mode to run them.

Nearly all of the integration testing of the Zeolite language itself is done using this feature, but it is also supported for general use with Zeolite projects.

Integrated Build System

The Zeolite compiler supports a module system that can incrementally compile projects without the user needing to create build scripts or Makefiles.

This means that the programmer can focus on code rather than on build rules, and module authors can avoid writing verbose build instructions for the users of their modules.

Data Encapsulation

The overall design of Zeolite revolves around data encapsulation:

Although all of these limitations preclude a lot of design decisions allowed in languages like Java, C++, and Python, they also drastically reduce the possible complexity of inter-object interactions. Additionally, they generally do not require ugly work-arounds; see the full examples.

Quick Start

Installation

Requirements:

If you use a modern Linux distribution, most of the above can be installed using the package manager that comes with your distribution. On MacOS, you can install Xcode for a C++ compiler and brew install cabal-install for cabal.

Once you meet all of those requirements, follow the installation instructions for the zeolite-lang package on Hackage. Please take a look at the issues page if you run into problems.

The entire process will probably look like this, once you have cabal and a C++ compiler installed:

$ cabal update
# Also add --overwrite-policy=always if you're upgrading to a specific version.
$ cabal install zeolite-lang
$ zeolite-setup -j8
# Follow interactive prompts...

You might also need to add $HOME/.cabal/bin or $HOME/.local/bin to your $PATH.

For syntax highlighting in Visual Studio Code, See "VS Code Support" in the Zeolite releases and download the .vsix file. If you happen to use the kate text editor, you can use the syntax highlighting in zeolite.xml.

Hello World

It's the any% of programming.

// hello-world.0rx

concrete HelloWorld {
  @type run () -> ()
}

define HelloWorld {
  run () {
    \ BasicOutput.stderr().writeNow("Hello World\n")
  }
}
# Compile.
zeolite -I lib/util --fast HelloWorld hello-world.0rx

# Execute.
./HelloWorld

Also see some full examples for more complete feature usage.

Writing Programs

This section breaks down the separate parts of a Zeolite program. See the full examples for a more integrated language overview.

Basic Ideas

Zeolite programs use object-oriented and procedural programming paradigms. Type categories are used to define object types, much like classes in Java and C++. They are not called "classes", just to avoid confusion about semantic differences with Java and C++.

All type-category names start with an uppercase letter and contain only letters and digits.

All procedures and data live inside concrete type categories. Every program must have at least one concrete category with the procedure to be executed when the program is run.

concrete categories are split into a declaration and a definition. Code for both should be in files ending with .0rx. (The .0rp file type contains only declarations, and will be discussed later.)

// myprogram/myprogram.0rx

// This declares the type.
concrete MyProgram {
  // The entry point must be a () -> () function. This means that it takes no
  // arguments and returns no arguments. (@type will be discussed later.)
  @type run () -> ()
}

// This defines the type.
define MyProgram {
  run () {
    // ...
  }
}

IMPORTANT: All programs or modules must be in their own directory so that zeolite is able to cache information about the build. Unlike some other compilers, you do not specify all command-line options every time you recompile a binary or module.

# Create a new .zeolite-module config. (Only once!)
zeolite -m MyProgram myprogram

# Recompile the module and binary. (After any config or code updates.)
# All sources in myprogram will be compiled. -m MyProgram selects the entry
# point. The default output name for the binary here is myprogram/MyProgram.
zeolite -r myprogram

# Execute.
myprogram/MyProgram
# An alternative, if you only have one .0rx and want to quickly iterate.
zeolite --fast MyProgram myprogram/myprogram.0rx

Declaring Functions

A function declaration specifies the scope of the function and its argument and return types. (And optionally type parameters and parameter filters, to be discussed later.) The declaration simply indicates the existence of a function, without specifying its behavior.

All function names start with a lowercase letter and contain only letters and digits.

concrete MyCategory {
  // @value indicates that this function requires a value of type MyCategory.
  // This function takes 2x Int and returns 2x Int.
  @value minMax (Int, Int) -> (Int, Int)

  // @type indicates that this function operates on MyCategory itself. This is
  // like a static function in C++.
  // This function takes no arguments and returns MyCategory.
  @type create () -> (MyCategory)

  // @category indicates that this function operates on MyCategory itself. This
  // is like a static function in Java. (The semantics of @category are similar
  // to those of @type unless there are type parameters.)
  @category copy (MyCategory) -> (MyCategory)
}

In many cases, the choice between using a @category function or a @type function is arbitrary, but there are pros and cons of each:

Defining Functions

Functions are defined in the category definition. They do not need to repeat the function declaration; however, they can do so in order to refine the argument and return types for internal use.

All function names start with a lowercase letter and contain only letters and digits.

The category definition can also declare additional functions that are not visible externally.

concrete MyCategory {
  @type minMax (Int, Int) -> (Int, Int)
}

define MyCategory {
  // minMax is defined here.
  minMax (x, y) {
    if (superfluousCheck(x, y)) {
      return x, y
    } else {
      return y, x
    }
  }

  // superfluousCheck is only available inside of MyCategory.
  @type superfluousCheck (Int, Int) -> (Bool)
  superfluousCheck (x, y) {
    return x < y
  }
}

All arguments must either have a unique name or be ignored with _.

@value functions have access to a special constant self, which refers to the object against which the function was called.

Using Variables

Variables are assigned with <- to indicate the direction of assignment. Every variable must be initialized; there are no null values in Zeolite. (However, see optional later on.)

All variable names start with a lowercase letter and contain only letters and digits.

When a location is needed for assignment (e.g., handling a function return, taking a function argument), you can use _ in place of a variable name to ignore the value.

// Initialize with a literal.
Int value <- 0

// Initialize with a function result.
Int value <- getValue()

Unlike other languages, Zeolite does not allow variable masking. For example, if there is already a variable named x available, you cannot create a new x variable even in a smaller scope.

All variables are shared and their values are not scoped like they are in C++. You should not count on knowing the lifetime of any given value.

As of compiler version 0.24.0.0, you can also swap the values of two variables that have the same type, as long as both are writable. This is more efficient than "manually" swapping using a temp variable.

Int foo <- 123
Int bar <- 456
foo <-> bar

Calling Functions

Return values from function calls must always be explicitly handled by assigning them to a variable, passing them to another function or ignoring them. (This is required even if the function does not return anything, primarily to simplify parsing.)

// Utilize the return.
Int value <- getValue()

// Explicitly ignore a single value.
_ <- getValue()

// Ignore all aspects of the return.
// (Prior to compiler version 0.3.0.0, ~ was used instead of \.)
\ printHelp()

Functions cannot be overloaded like in Java and C++. Every function must have a unique name. Functions inherited from different places can be explicitly merged, however. This can be useful if you want interfaces to have overlapping functionality without having an explicit parent for the overlap.

@value interface ForwardIterator<|#x> {
  next () -> (optional #self)
  get () -> (#x)
}

@value interface ReverseIterator<|#x> {
  prev () -> (optional #self)
  get () -> (#x)
}

concrete Iterator<|#x> {
  refines ForwardIterator<#x>
  refines ReverseIterator<#x>

  // An explicit override is required in order to merge get from both parents.
  get () -> (#x)
}

Functions As Operators

Zeolite allows some functions to be used as operators. This allows users to avoid excessive parentheses when using named mathematical functions.

Functions with two arguments can use infix notation. The operator precedence is always between comparisons (e.g., ==) and logical (e.g., &&).

Functions with one argument can use prefix notation. These are evaluated strictly before all infix operators.

concrete Math {
  @type plus (Int, Int) -> (Int)
  @type neg (Int) -> (Int)
}

// ...

// Math.plus is evaluated first.
Int x <- 1 `Math.plus` 2 * 5
// Math.neg is evaluated first.
Int y <- `Math.neg` x `Math.plus` 2

Data Members and Value Creation

Unlike Java and C++, there is no "default construction" in Zeolite. In addition, Zeolite also lacks the concept of "copy construction" that C++ has. This means that new values can only be created using a factory function. In combination with required variable initialization, this ensures that the programmer never needs to worry about unexpected missing or uninitialized values.

Data members are never externally visible; they only exist in the category definition. Any access outside of the category must be done using explicitly-defined functions.

concrete MyCategory {
  @type create () -> (MyCategory)
}

define MyCategory {
  // A data member unique to each MyCategory value.
  @value Int value

  create () {
    // Initialization is done with direct assignment.
    return MyCategory{ 0 }
  }
}

// ...

// Create a new value in some other procedure.
MyCategory myValue <- MyCategory.create()

There is no syntax for accessing a data member from another object; even objects of the same type. This effectively makes all variables internal rather than just private like in Java and C++. As long as parameter variance is respected, you can provide access to an individual member with getters and setters.

As of compiler version 0.14.0.0, you can use #self in place of the full type when you are creating a value of the same type from a @type or @value function.

concrete MyCategory {
  @type create () -> (MyCategory)
}

define MyCategory {
  @value Int value

  create () {
    return #self{ 0 }
  }
}

A category can also have @category members, but not @type members. (The latter is so that the runtime implementation can clean up unused @types without introducing ambiguitites regarding member lifespan.)

concrete MyCategory {
  @type global () -> (MyCategory)
}

define MyCategory {
  @value Int value

  // @category members use inline initialization.
  @category MyCategory singleton <- MyCategory{ 0 }

  global () {
    // @category members are accessible from all functions in the category.
    return singleton
  }
}

Conditionals

Zeolite uses the if/elif/else conditional construct. The elif and else clauses are always optional.

if (x) {
  // something
} elif (y) {
  // something
} else {
  // something
}

Scoping and Cleanup

Variables can be scoped to specific blocks of code. Additionally, you can provide a cleanup procedure to be executed upon exit from the block of code. This is useful if you want to free resources without needing to explicitly do so for every return statement.

// Simple scoping during evaluation.
scoped {
  Int x <- getValue()
} in if (x < 0) {
  // ...
} elif (x > 0) {
  // ...
} else {
  // ...
}

// Simple scoping during assignment.
scoped {
  Int x <- getValue1()
  Int y <- getValue2()
} in Int z <- x+y

// Scoping with cleanup.
scoped {
  // ...
} cleanup {
  // ...
} in {
  // ...
}

// Cleanup without scoping.
cleanup {
  i <- i+1  // Post-increment behavior.
} in return i

The cleanup block is executed at every return, break, and continue in the respective in block, and right after the in block. For this reason, you cannot use return, break, or continue within a cleanup block. Additionally, you cannot overwrite named returns. You can use fail, however, since that just ends program execution.

When cleanup is executed at a return statement in the in block, the returns from the return statement are "locked in", then cleanup is executed, then those locked-in return values are returned. (This is what allows the post-increment example above to work.)

Loops

Zeolite supports two loop types:

  1. while loops, which are the traditional repetition of a procedure while a predicate holds.

    // With break and continue.
    while (true) {
     if (true) {
       break
     } else {
       continue
     }
    }
    
    // With an update after each iteration.
    while (true) {
     // ...
    } update {
     // ...
    }
  2. traverse loops (as of compiler version 0.16.0.0), which automatically iterate over the #x values in an optional Order<#x>. This is similar to for (int i : container) { ... } in C++ and for i in container: ... in Python.

    traverse (orderedStrings -> String s) {
     // executed once per String s in orderedStrings
     // you can also use break and continue
    }
    
    // With an update after each iteration.
    traverse (orderedStrings -> String s) {
     // ...
    } update {
     // ...
    }

    Since the Order is optional, empty can be used to iterate zero times.

    IMPORTANT: Most containers are not iterable by traverse as-is; you will need to call a @value function to get the Order. Some categories refine DefaultOrder<#x> (such as String, and Vector in lib/container), which allows you to use its defaultOrder(). Other categories provide multiple ways to Order the container, such as SearchTree in lib/container.

    traverse ("hello".defaultOrder() -> Char c) {
     // executed once per Char c in "hello"
    }

for loops (e.g., for (int i = 0; i < foo; ++i) { ... } in C++) are not supported, since such syntax is too restrictive to scale, and they can be replaced with traverse or scoped+while in nearly all situations.

// Combine while with scoped to create a for loop.
scoped {
  Int i <- 0
  Int limit <- 10
} in while (i < limit) {
  // ...
} update {
  i <- i+1
}

Multiple Returns

A procedure definition has two options for returning multiple values:

  1. Return all values. (Prior to compiler version 0.3.0.0, multiple returns were enclosed in {}, e.g., return { x, y }.)

    define MyCategory {
     minMax (x, y) {
       if (x < y) {
         return x, y
       } else {
         return y, x
       }
     }
    }
  2. Naming the return values and assigning them individually. This can be useful (and less error-prone) if the values are determined at different times. The compiler uses static analysis to ensure that all named variables are guaranteed to be set via all possible control paths.

    define MyCategory {
     // Returns are named on the first line.
     minMax (x, y) (min, max) {
       // Returns are optionally initialized up front.
       min <- y
       max <- x
       if (x < y) {
         // Returns are overwritten.
         min <- x
         max <- y
       }
       // Implicit return makes sure that all returns are assigned. Optionally,
       // you can use return _.
     }
    }
  3. To return early when using named returns or when the function has no returns, use return _. You will get an error if a named return might not be set.

The caller of a function with multiple returns also has a few options:

  1. Assign the returns to a set of variables. You can ignore a position by using _ in that position. (Prior to compiler version 0.3.0.0, multiple assignments were enclosed in {}, e.g., { Int min, _ } <- minMax(4,3).)

    Int min, _ <- minMax(4, 3)
  2. Pass them directly to a function that requires the same number of compatible arguments. (Note that you cannot concatenate the returns of multiple functions.)

    Int delta <- diff(minMax(4, 3))
  3. If you need to immediately perform an operation on just one of the returned values while ignoring the others, you can select just that return inline. (As of compiler version 0.21.0.0.)

    // Select return 0 from minMax.
    return minMax(4, 3){0}

    Note that the position must be an integer literal so that the compiler can validate both the position and the return type.

Optional and Weak Values

Zeolite requires that all variables be initialized; however, it provides the optional storage modifier to allow a specific variable to be empty. This is not the same as null in Java because optional variables need to be required before use.

// empty is a special value for use with optional.
optional Int value <- empty

// Non-optional values automatically convert to optional.
value <- 1

// present returns true iff the value is not empty.
if (present(value)) {
  // Use require to convert the value to something usable.
  \ foo(require(value))
}

As of compiler version 0.24.0.0, you can use <-| to conditionally overwrite an optional variable if it's currently empty.

optional Int value <- empty
value <-| 123    // Assigned, because value was empty.
value <-| 456    // Not assigned, because value wasn't empty.
value <-| foo()  // foo() isn't called unless value is empty.

Note that if the right side isn't optional then you can use the result as non-optional.

optional Int value <- empty
Int value2 <- (value <-| 123)

As of compiler version 0.24.0.0, you can conditionally call a function on an optional value if it's non-empty using &..

optional Int value <- 123
// All returned values will be optional.
optional Formatted formatted <- value&.formatted()
// foo() won't be called unless the readAt call is going to be made.
optional Char char <- formatted&.readAt(foo())

As of compiler version 0.24.1.0, you can use x <|| y to use y if x is empty. Note that x must have an optional type, and the resulting type of the entire expression is the type union of the types of x and y.

weak values allow your program to access a value if it is available, without holding up that value's cleanup if nothing else needs it. This can be used to let threads clean themselves up (example below) or to handle cycles in references between objects.

concrete MyRoutine {
  @type createAndRun () -> (MyRoutine)
  @value waitCompletion () -> ()
}

define MyRoutine {
  refines Routine

  // (See lib/thread.)
  @value weak Thread thread

  createAndRun () {
    // Create a new MyRoutine and then start the thread.
    return MyRoutine{ empty }.start()
  }

  run () {
    // routine
  }

  waitCompletion () {
    scoped {
      // Use strong to turn weak into optional. If the return is non-empty, the
      // value is guaranteed to remain valid while using thread2.
      optional Thread thread2 <- strong(thread)
    } in if (present(thread2)) {
      \ require(thread2).join()
    }
  }

  @value start () -> (#self)
  start () {
    // ProcessThread holds a reference to itself only while the Routine is
    // running. Making thread weak means that the ProcessThread can clean itself
    // up once the Routine terminates.
    thread <- ProcessThread.from(self).start()
    return self
  }
}

Deferred Variable Initialization

In some situations, a variable's value depends on conditional logic, and there is no low-cost default value. In such situations, you can use the defer keyword to allow a variable to be temporarily uninitialized. (As of compiler version 0.20.0.0.)

LargeObject object <- defer

if (debug) {
  object <- LargeObject.newDebug()
} else {
  object <- LargeObject.new()
}

\ object.execute()

In this example, object is declared without an initializer, and is then initialized in both the if and else clauses.

Immediate Program Termination

There are two ways to terminate the program immediately.

  1. The fail builtin can be used to immediately terminate the program with a stack trace. This is not considered a function call since it cannot return; therefore, do not precede it with \.

    define MyProgram {
     run () {
       fail("MyProgram does nothing")
     }
    }

    The value passed to fail must implement the Formatted builtin @value interface.

    The output to stderr will look something like this:

    ./MyProgram: Failed condition: MyProgram does nothing
     From MyProgram.run at line 7 column 5 of myprogram.0rx
     From main
    Terminated
  2. The exit builtin can be used to immediately terminate the program with a traditional Int exit code. (0 conventionally means program success.) This is not considered a function call since it cannot return; therefore, do not precede it with \.

    define MyProgram {
     run () {
       exit(0)
     }
    }

    The value passed to exit must be an Int.

Call Delegation

As of compiler version 0.24.0.0, you can delegate function calls using the delegate keyword. This has the effect of forwarding all of the arguments passed to the enclosing function call to the handler specified. (The call is actually rewritten using a substitution during compilation.)

new (value1,value2) {
  // Same as Value{ value1, value2 }.
  \ delegate -> Value

  // Same as foo(value1,value2).
  \ delegate -> `foo`

  // Same as something(123).bar(value1,value2).
  \ delegate -> `something(123).bar`
}

IMPORTANT: If the enclosing function specifies argument labels then those will be used in the forwarded call.

@type new (String name:, Int) -> (Value)
new (value1,value2) {
  // Same as foo(name: value1, value2).
  return delegate -> `foo`
}

IMPORTANT: Delegation will fail to compile if:

  1. One or more function arguments is ignored with _, e.g., call(_) { ... }.
  2. One or more function arguments is hidden with $Hidden[]$, e.g., $Hidden[someArg]$.

This is primarily as a sanity check, since all of the above imply that a given argument should not be used.

Function Argument Labels

As of compiler version 0.24.0.0, function declarations in Zeolite can optionally have labels for any individual argument. Note that this is a label and not an argument name.

All labels start with a lowercase letter and contain only letters and digits, and end with :.

Using Parameters

All concrete categories and all interfaces can have type parameters. Each parameter can have a variance rule assigned to it. This allows the compiler to do type conversions between different parameterizations.

Parameter names must start with # and a lowercase letter, and can only contain letters and digits.

Parameters are never repeated in the category or function definitions. (Doing so would just create more opportunity for unnecessary compile-time errors.)

// #x is covariant (indicated by being to the right of |), which means that it
// can only be used for output purposes.
@value interface Reader<|#x> {
  read () -> (#x)
}

// #x is contravariant (indicated by being to the left of |), which means that
// it can only be used for input purposes.
@value interface Writer<#x|> {
  write (#x) -> ()
}

// #x is for output and #y is for input, from the caller's perspective.
@value interface Function<#x|#y> {
  call (#x) -> (#y)
}

// By default, parameters are invariant, i.e., cannot be converted. You can also
// explicitly specify invariance with <|#x|>. This allows all three variance
// types to be present.
concrete List<#x> {
  @value append (#x) -> ()
  @value head () -> (#x)
}

// Use , to separate multiple parameters that have the same variance.
concrete KeyValue<#k, #v> {
  @type new (#k, #v) -> (#self)
  @value key   () -> (#k)
  @value value () -> (#v)
}

Using Interfaces

Zeolite has @value interfaces that are similar to Java interfaces, which declare functions that implementations must define. In addition, Zeolite also has @type interfaces that declare @type functions that must be defined. (This would be like having abstract static functions in Java.)

// @value indicates that the interface declares @value functions.
@value interface Printable {
  // @value is not allowed in the declaration.
  print () -> ()
}

// @type indicates that the interface declares @type functions.
@type interface Diffable<#x> {
  // @type is not allowed in the declaration.
  diff (#x, #x) -> (#x)
}
Type Param Variance Param Filters Can Inherit @category Funcs @type Funcs @value Funcs Define Procedures
concrete @value interface @type interface
@value interface @value interface
@type interface --

Immutable Types

You can modify interface and concrete with immutable at the very top of the declaration. (As of compiler version 0.20.0.0.) This creates two requirements for @value members:

  1. They are marked as read-only, and cannot be overwritten with <-.
  2. They must have a type that is also immutable.

(@category members are not affected.)

Note that this applies to the entire implementation; not just to the implementations of functions required by the immutable interface. immutable is therefore intended for objects that cannot be modified, rather than as a way to define a read-only view (e.g., const in C++) of an object.

@value interface Foo {
  immutable

  call () -> ()
}

concrete Bar {
  refines Foo

  @type new () -> (Bar)
  @value mutate () -> ()
}

define Bar {
  @value Int value

  new () { return Bar{ 0 } }

  call () {
    // call cannot overwrite value
  }

  mutate () {
    // mutate also cannot overwrite value, even though mutate isn't in Foo.
  }
}

For members that use a parameter as a type, you can use immutable as a filter if the other filters do not otherwise imply it. Note that this will prevent substituting in a non-immutable type when calling @type functions.

concrete Type<#x> {
  immutable

  #x immutable
}

define Type {
  // #x is allowed as a member type because of the immutable filter.
  @value #x value
}

The #self Parameter

Every category has an implicit covariant parameter #self. (As of compiler version 0.14.0.0.) It always means the type of the current category, even when inherited. (#self is covariant because it needs to be convertible to a parent of the current category.)

For example:

@value interface Iterator<|#x> {
  next () -> (#self)
  get () -> (#x)
}

concrete CharIterator {
  refines Iterator<Char>
  // next must return CharIterator because #self = CharIterator here.
}

The primary purpose of this is to support combining multiple interfaces with iterator or builder semantics into composite types without getting backed into a corner when calling functions from a single interface.

@value interface ForwardIterator<|#x> {
  next () -> (#self)
  get () -> (#x)
}

@value interface ReverseIterator<|#x> {
  prev () -> (#self)
  get () -> (#x)
}

concrete CharIterator {
  refines ForwardIterator<Char>
  refines ReverseIterator<Char>
  get () -> (Char)  // (Remember that merging needs to be done explicitly.)
}

concrete Parser {
  // trimWhitespace can call next and still return the original type. In
  // contrast, if next returned ForwardIterator<#x> then trimWhitespace would
  // need to return ForwardIterator<Char> to the caller instead of #i.
  @type trimWhitespace<#i>
    #i requires ForwardIterator<Char>
  (#i) -> (#i)
}

#self can also be used to generalize a factory pattern:

@type interface ParseFactory {
  fromString (String) -> (#self)
}

concrete FileParser {
  @type parseFromFile<#x>
    #x defines ParseFactory
  (String) -> (#x)
}

define FileParser {
  parseFromFile (filename) {
    String content <- FileHelper.readAll(filename)
    // Notice that ParseFactory doesn't need a type parameter to indicate what
    // type is going to be parsed in fromString; it's sufficient to know that #x
    // implements ParseFactory and that fromString returns #self.
    return #x.fromString(content)
  }
}

concrete Value {
  defines ParseFactory
}

define Value {
  fromString (string) {
    if (string == "Value") {
      return Value{ }
    } else {
      fail("could not parse input")
    }
  }
}

#self is nothing magical; this could all be done by explicitly adding a covariant #self parameter to every type, with the appropriate requires and defines filters.

Type Inference

Starting with compiler version 0.7.0.0, Zeolite supports optional inference of specific function parameters by using ?. This must be at the top level (no nesting), and it cannot be used outside of the parameters of the function.

The type-inference system is intentionally "just clever enough" to do things that the programmer can easily guess. More sophisticated inference is feasible in theory (like Haskell uses); however, type errors with such systems can draw a significant amount of attention away from the task at hand. (For example, a common issue with Haskell is not knowing which line of code contains the actual mistake causing a type error.)

concrete Value<#x> {
  @category create1<#x> (#x) -> (Value<#x>)
  @type     create2     (#x) -> (Value<#x>)
}

// ...

// This is fine.
Value<Int> value1 <- Value:create1<?>(10)

// These uses of ? are not allowed:
// Value<Int> value2 <- Value<?>.create2(10)
// Value<?>   value2 <- Value<Int>.create2(10)

Only the function arguments and the parameter filters are used to infer the type substitution; return types are ignored. If inference fails, you will see a compiler error and will need to explicitly write out the type.

As of compiler version 0.21.0.0, if you want to infer all params, you can skip <...> entirely. If you only want to infer some of the params, you must specify all params, using ? for those that should be inferred.

Type inference will only succeed if:

  1. There is a valid pattern match between the expected argument types and the types of the passed arguments.

  2. There is exactly one type that matches best:

    • For params only used in covariant positions, the lower bound of the type is unambiguous.
    • For params only used in contravariant positions, the upper bound of the type is unambiguous.
    • For all other situations, the upper and lower bounds are unambiguous and equal to each other.

Type inference in the context of parameterized types is specifically disallowed in order to limit the amount of code the reader needs to search to figure out what types are being used. Forcing explicit specification of types for local variables is more work for the programmer, but it makes the code easier to reason about later on.

Other Features

This section discusses language features that are less frequently used.

Meta Types

Zeolite provides two meta types that allow unnamed combinations of other types.

Intersection and union types also come up in type inference.

// (Just for creating an output parameter.)
concrete Writer<#x|> {
  @type new () -> (#self)
}

concrete Helper {
  // #x is only used for input to the function.
  @type inferInput<#x> (#x, #x) -> (String)

  // #x is only used for output from the function.
  // (This is due to contravariance of #x in Writer.)
  @type inferOutput<#x> (Writer<#x>, Writer<#x>) -> (String)
}

define Helper {
  inferInput (_, _) { return typename<#x>().formatted() }
  inferOutput (_, _) { return typename<#x>().formatted() }
}

define Writer {
  new () { return #self{ } }
}

// ...

// Returns "[Int | String]".
\ Helper.inferInput(123, "message")

// Returns "[Int & String]".
\ Helper.inferOutput(Writer<Int>.new(), Writer<String>.new())

In this context, unions/intersections are the most restrictive valid types that will work for the substution. (They are respectively the coproduct/product of the provided types under implicit type conversion.)

Explicit Type Conversion

In some situations, you might want to peform an explicit type conversion on a @value. The syntax for such conversions is value?Type, where value is any @value and Type is any type, including params and meta types.

Runtime Type Reduction

The reduce builtin function enables very limited runtime reasoning about type conversion.

Here are a few motivating use-cases:

@value interface AnyObject {
  getAs<#y> () -> (optional #y)
}

concrete Object<#x> {
  @category create<#x> (#x) -> (AnyObject)
}

define Object {
  refines AnyObject

  @value #x value

  create (value) { return Object<#x>{ value } }
  getAs  ()      { return reduce<#x, #y>(value) }
}
AnyObject value <- Object:create<?>("message")

// This will be empty because String does not convert to Int.
optional Int value1 <- value.getAs<Int>()

// This will be "message" as Formatted because String converts to Formatted.
optional Formatted value2 <- value.getAs<Formatted>()

reduce cannot be used to "downcast" a value (e.g., converting a Formatted to a Float) since the argument has the same type as the first parameter.

For example, reduce<#x, #y>(value) checks #x#y, and since value must be optional #x, value can only be converted upward. In other words, it only allows conversions that would otherwise be allowed, returning empty for all other conversions.

The AnyObject example above works because Object stores the original type passed to create as #x, which it then has available for the reduce call. The type variables #x and #y are the primary inputs to reduce; there is absolutely no examination of the "real" type of value at runtime.

// Here we explicitly set #x = Formatted when calling create.
AnyObject value <- Object:create<Formatted>("message")

// This will be empty even though the actual value is a String because getAs
// uses #x = Formatted in the reduce call.
optional String value1 <- value.getAs<String>()

Value Instance Comparison

As of compiler version 0.24.0.0, you can get a value that identifies a specific @value instance using the identify builtin.. This can be useful for creating identifiers that don't otherwise have a unique member.

String value <- "value"
Identifier<String> valueId <- identify(value)

Limited Function Visibility

As of compiler version 0.24.0.0, you can restrict where @value and @type functions can be called from with the visibility keyword.

concrete Value {
  // This applies to everything below.
  visibility Factory

  // This can only be called from Factory.
  @type new () -> (Value)

  // This resets the visibility to the default.
  visibility _

  // This can be called from anywhere.
  @value call () -> ()
}

Builtins

Reserved Words

#self @category @type @value _ all allows any break cleanup concrete continue defer define defines delegate elif else empty exit fail false identify if immutable in interface optional present reduce refines require requires return scoped self strong testcase traverse true typename unittest update visibility weak while

Builtin Types

See builtin.0rp and testing.0rp for more details about builtin types. (For your locally-installed version, which might differ, see $(zeolite --get-path)/base/builtin.0rp.)

Builtin concrete types:

Builtin @value interfaces:

Builtin @type interfaces:

Builtin meta-types:

Builtin Constants

Builtin Functions

Procedural Operators

Operators Semantics Example Input Types Result Type Notes
+,-,*,/ arithmetic x + y Int,Float original type
% arithmetic x % y Int Int
- arithmetic x - y Char Int
^,\|,&,<<,>>,~ bit operations x ^ y Int Int
+ concatenation x + y String String
^,!,\|\|,&& logical x && y Bool Bool
<,>,<=,==,>=,!= comparison x < y built-in unboxed, String Bool not available for Pointer
. function call x.foo() value function return type(s)
. function call T.foo() type instance function return type(s)
: function call T:foo() category function return type(s)
&. conditional function call x&.foo() optional value function return type(s) converted to optional skips evaluation of args if call is skipped
? type conversion x?T left: value
right: type instance
right type with optionality of left can also be used with optional values
<- assignment x <- y left: variable
right: expression
right type
<-\| conditional assignment x <-\| y left: optional variable
right: non-weak expression
left type with optionality of right skips evaluation of right if left is present
<\|\| fallback value x <\|\| y left: optional expression
right: non-weak expression
union of left and right types with optionality of right skips evaluation of right if left is present

Layout and Dependencies

Using Public Source Files

You can create public .0rp source files to declare concrete categories and interfaces that are available for use in other sources. This is the only way to share code between different source files. .0rp cannot contain defines for concrete categories.

During compilation, all .0rp files in the project directory are loaded up front. This is then used as the set of public symbols available when each .0rx is separately compiled.

Standard Library

The standard library currently temporary and lacks a lot of functionality. See the public .0rp sources in lib. Documentation will eventually follow.

Modules

You can depend on another module using -i lib/util for a public dependency and -I lib/util for a private dependency when calling zeolite. (A private dependency is not visible to modules that depend on your module.)

Dependency paths are first checked relative to the module depending on them. If the dependency is not found there, the compiler then checks the global location specified by zeolite --get-path.

Public .0rp source files are loaded from all dependencies during compilation, and so their symbols are available to all source files in the module. There is currently no language syntax for explicitly importing or including modules or other symbols.

If you are interested in backing a concrete category with C++, you will need to write a custom .zeolite-module file. Better documentation will eventually follow, but for now:

  1. Create a .0rp with declarations of all of the concrete categories you intend to define in C++ code.
  2. Run zeolite in --templates mode to generate .cpp templates for all concrete categories that lack a definition in your module.
  3. Run zeolite in -c mode to get a basic .zeolite-module. After this, always use recompile mode (-r) to use your .zeolite-module.
  4. Take a look at .zeolite-module in lib/file to get an idea of how to tell the compiler where your category definitions are.
  5. Add your code to the generated .cpp files. lib/file is also a reasonable example for this.
  6. If you need to depend on external libraries, fill in the include_paths and link_flags sections of .zeolite-module.

IMPORTANT: @value functions for immutable categories will be marked as const in C++ extensions. immutable also requires that @value members have immutable types, but there is no reasonable way to enforce this in C++. You will need to separately ensure that the implementation only stores other immutable types, just for consistency with categories implemented in Zeolite.

Unit Testing

Unit testing is a built-in capability of Zeolite. Unit tests use .0rt source files, which are like .0rx source files with testcase metadata. The test files go in the same directory as the rest of your source files. (Elsewhere in this project these tests are referred to as "integration tests" because this testing mode is used to ensure that the zeolite compiler operates properly end-to-end.)

Writing Tests

IMPORTANT: Prior to compiler version 0.10.0.0, the testcase syntax was slightly different, and unittest was not available.

// myprogram/tests.0rt

// Each testcase starts with a header specifying a name for the group of tests.
// This provides common setup code for a group of unit tests.

testcase "passing tests" {
  // All unittest are expected to execute without any issues.
  success
}

// Everything after the testcase (up until the next testcase) is like a .0rx.

// At least one unittest must be defined when success is expected. Each unittest
// must have a distinct name within the testcase. Each unittest is run in a
// separate process, making it safe to alter global state.

unittest myTest1 {
  // The test content goes here. It has access to anything within the testcase
  // besides other unittest.
}

unittest myTest2 {
  \ empty
}

// A new testcase header indicates the end of the previous test.

testcase "missing function" {
  // The test is expected to have a compilation error. Note that this cannot be
  // used to check for parser failures!
  //
  // Any testcase can specify require and exclude regex patterns for checking
  // test output. Each pattern can optionally be qualified with one of compiler,
  // stderr, or stdout, to specify the source of the output.
  error
  require compiler "run"  // The compiler error should include "run".
  exclude compiler "foo"  // The compiler error should not include "foo".
}

// You can include unittest when an error is expected; however, they will not be
// run even if compilation succeeds.

define MyType {
  // Error! MyType does not have a definition for run.
}

concrete MyType {
  @type run () -> ()
}

testcase "intentional failure" {
  // The test is expected to fail.
  failure
  require stderr "message"  // stderr should include "message".
}

// Exactly one unittest must be defined when a failure is expected.

unittest myTest {
  // Use the fail built-in to cause a test failure.
  fail("message")
}

testcase "compilation tests" {
  // Use compiles to check only Zeolite compilation, with no C++ compilation or
  // execution of tests.
  compiles
}

unittest myTest {
  // unittest is optional in this mode, but can still be used if the tests does
  // not require any new types.
}

Unit tests have access to all public and $ModuleOnly$ symbols in the module. You can run all tests for module myprogram using zeolite -t myprogram.

Specific things to keep in mind with testcase:

Code Coverage

As of compiler version 0.16.0.0, you can get a log of all lines of Zeolite code (from .0rx or .0rt sources) with the --log-traces[filename] option when running tests with zeolite -t.

As of compiler version 0.20.0.0, zeolite -r will cache information about the possible .0rx lines that can show up in --log-traces mode.

Compiler Pragmas and Macros

(As of compiler version 0.5.0.0.)

Pragmas allow compiler-specific directives within source files that do not otherwise need to be a part of the language syntax. Macros have the same format, and are used to insert code after parsing but before compilation.

The syntax for both is $SomePragma$ (no options) or $AnotherPragma[OPTIONS]$ (uses pragma-specific options). The syntax for OPTIONS depends on the pragma being used. Pragmas are specific to the context they are used in.

Source File Pragmas

These must be at the top of the source file, before declaring or defining categories or testcases.

Procedure Pragmas

These must occur at the very top of a function definition.

define Pragmas

These must be at the top of a category define immediately following {.

unittest Pragmas

These must be at the top of a unittest immediately following {.

Local Variable Rules

These pragmas alter how variables are dealt with locally:

Zeolite uses pragmas instead of something like final in Java for a few reasons:

Expression Macros

These can be used in place of language expressions.

Known Language Limitations

Reference Counting

Zeolite currently uses reference counting rather than a garbage-collection system that might otherwise search for unreferenced objects in the background. While this simplifies the implementation, it is possible to have a reference cycle that prevents cleanup of the involved objects, thereby causing a memory leak.

This can be mitigated by using weak references in categories where a cycle is probable or guaranteed. For example, LinkedNode in lib/container is a doubly-linked list, which would create a reference cycle if both forward and reverse references were non-weak.