mikeizbicki / subhask

Type safe interface for working in subcategories of Hask
BSD 3-Clause "New" or "Revised" License
418 stars 43 forks source link

SubHask

SubHask is a radical rewrite of the Haskell Prelude. The goal is to make numerical computing in Haskell fun and fast. The main idea is to use a type safe interface for programming in arbitrary subcategories of Hask. For example, the category Vect of linear functions is a subcategory of Hask, and SubHask exploits this fact to give a nice interface for linear algebra. To achieve this goal, almost every class hierarchy is redefined to be more general.

SubHask is a work in progress. This README is intended to be a "quick start guide" to get you familiar with the current status and major differences from standard Haskell.

Table of contents:

Installing

SubHask depends on:

  1. GHC >= 7.10. You can download the latest version of GHC here.

  2. llvm >= 3.5, llvm < 3.6. To install on Linux or Mac, run the following commands:

    $ wget http://llvm.org/releases/3.5.2/llvm-3.5.2.src.tar.xz
    $ tar -xf llvm-3.5.2.src.tar.xz
    $ cd llvm-3.5.2.src
    $ mkdir build
    $ cd build
    $ cmake ..
    $ make -j5
    $ sudo make install
  3. Any version of BLAS and LAPACK.

How to install these packages varies for different operating systems. For Debian/Ubuntu systems, you can install them using:

```
$ sudo apt-get install libblas-dev liblapack-dev
```

For macOS, they are preinstalled as part of the Accelerate Framework.

The easiest way to build and try Subhask is to clone the repo

git clone https://github.com/mikeizbicki/subhask
cd subhask

and use stack to build. eg

stack test --bench

will build the project, run the tests, and run the benchmark.

Examples

See the examples folder for the literate haskell files.

New Class Hierarchies

Category Hierarchy

The modified category hierarchy closely follows the presentation in the Rosetta Stone paper.

The image below shows the category hierarchy:

Important points:

  1. Intuitively, Concrete categories are functions that have been annotated with special properties. More formally, a Concrete category is one that is a subtype of (->). Subtyping is not a builtin feature of the Haskell language, but we simulate subtyping using the class <:. See the documentation in SubHask.SubType for more details.

  2. SubHask contains implementations of both categories and what I call "category transformers." A category transformer creates a type corresponding to a subcategory in the original category. For example, we can use the category transformer MonT :: (* -> * -> *) -> * -> * -> * to construct the category MonT (->) :: * -> * -> *, which corresponds to the category of monotonic functions. See the SubHask.Category.Trans.Monotonic module for details.

    The categories can be found in the SubHask.Category.* modules, and transformers can be found inSubHask.Category.Trans.* modules. The design of these transformers roughly follows that of the mtl library to allow for composition of transformers.

  3. I have removed the Arrow hierarchy in favor of a more principled approach. Some of Arrow's functionality has also been removed since I've never found a use for it, but it will probably be added at a future point as SubHask matures.

Functor hierarchy

In the standard Prelude, the Functor type class corresponds to "endofunctors on the category Hask". SubHask generalizes this definition to endofunctors on any category:

class Category cat => Functor cat f where
    fmap :: cat a b -> cat (f a) (f b)

The image below shows the functor hierarchy:

The dashed lines above mean that the Functor, Applicative, and Monad instances can depend on a category.

Important points:

  1. This modified functor hierarchy gives us a lot of power. For example, we can finally make Set an instance of Monad! Actually, Set is an instance of Monad in two separate categories: the category of functions with an Ord constraint (i.e. OrdHask) and the category of monotonic functions (i.e. MonT (->) mentioned above). Semantically, both have the same meaning, but the monotonic fmap runs faster.

  2. We've introduced a new class Then that does not depend on the Category. This class is a hack to make monads play nice with do notation; it's only member function is the (>>) operator. There's probably something deep going on here that I'm just not aware of.

  3. Notice that the Applicative class is not a super class of Monad. While it's true that every Monad in Hask is also an Applicative, this does not appear to be true for arbitrary categories. At least it's definitely not true given the current definition of the Category class I've defined. I'm not sure if that's a limitation of my design or something more fundamental.

  4. The functor hierarchy is much smaller than the functor hierarchy available with base. I haven't included Prelude classes like Alternative, and I haven't included all of the classes Edward Kmett is famous for (see e.g. category-extras). All of these class can in principle be extended to the more generic setting of SubHask, I just haven't gotten around to it yet.

    Lens is the most famous package that uses the extended funtor hierarchy. As-is, the current version of lens is fully compatible with SubHask; however, the container hierarchy below obviates the need for most of the fancy lenses. Eventually, I'd like to implement lenses in arbitrary categories. For example, you could use a monotonic lens to guantee updates to a data structure are monotonic. I haven't done very much work on this yet though.

    Another interesting category theoretic Kmett library is hask. Everything in that library can be translated to SubHask, but that's not something I've done yet.

Comparison Hierarchy

SubHask's comparison hierarchy is significantly more complicated than Prelude's. It is directly inspired by order theory and non-classical logic.

The hierarchy is shown in the following image:

Important points:

  1. A type in SubHask can be compared using non-classical logics. Consider the type of equality comparison:

    (==) :: Eq_ a => a -> a -> Logic a

    The return value is given by the type family Logic a, which specifies the logical system used on the type a.

    For most types, Logic a will be Bool, and everything will behave as you would expect. But this more general type lets us define equality on types for which classical equality is either uncomputable, undefined, or not what we actually want.

    Consider the case of functions. Classical equality over functions is uncomputable. But in SubHask, we define:

    type instance Logic (a -> b) = a -> Logic b
    
    class Eq_ b => Eq_ (a -> b) where
        (f==g) a = f a == g a

    This non-classical logic simplifies many situations. For example, we can use the (&&) and (||) operators on functions:

    ghci> filter ( (>='c') && (<'f') || (=='q') ) ['a'..'z']
    "cdeq"

    I have a hunch this will make for a nice probabalistic programming interface, but I could turn out completely wrong.

Container Hierarchy

SubHask's container hierarchy is inspired by the mono-traversable and classy-prelude packages. These packages use type families to make the standard type classes applicable to more data types. For example, they can make ByteString an instance of Foldable, whereas the Prelude classes cannot. This makes code look more generic, but unfortunately these packages' classes come with no laws. In contrast, SubHask provides a clear and useful set of laws for each type class.

The container laws are closely related to the axioms of set theory. The main two differences are that SubHask's laws handle the case of non-commutative containers but don't bother with infinitely sized containers. See the automated-testing section below for more details on class laws.

The container hierarchy is shown in the image below:

Important points about containers:

Numeric Hierarchy

SubHask is directly inspired by a lot of good existing work on improving Haskell's numeric support. For example:

You can see it in the image below:

Important points:

Automated testing

There are currently over 1000 quickcheck properties being checked in the test suite. But I didn't write any of these tests by hand. Whenever I implement a new data type, template haskell functions add appropriate tests to the test suite automatically. I literally don't have to think at all about writing tests and I still get the full benefits. Here's how it works.

Each class in the new hierarchies above comes with a set of laws they must obey. Those laws are documented using quickcheck properties. These properties fully describe the intended behavior of the class, and any instance that passes the quickcheck tests is a valid instance of the class.

For example, the Eq class is intended to capture the notion of equivalence classes from algebra. The class definition is:

class Eq_ a where
    (==) :: a -> a -> Logic a
    (/=) :: a -> a -> Logic a

and the quickcheck properties are:

law_Eq_reflexive :: Eq a => a -> Logic a
law_Eq_reflexive a = a==a

law_Eq_symmetric :: Eq a => a -> a -> Logic a
law_Eq_symmetric a1 a2 = (a1==a2) == (a2==a1)

law_Eq_transitive :: Eq a => a -> a -> a -> Logic a
law_Eq_transitive a1 a2 a3 = (a1==a2&&a2==a3) ==> (a1==a3)

defn_Eq_noteq :: (Complemented (Logic a), Eq a) => a -> a -> Logic a
defn_Eq_noteq a1 a2 = (a1/=a2) == (not $ a1==a2)

The three properties prefixed with law capture the laws of the equivalence classes and the property prefixed with defn shows how the operators (==) and (/=) must relate to each other.

You can use these laws to automatically test any data types you implement. All you have to do is call the mkSpecializedClassTests template haskell function on the type you want to test. This function constructs the test cases and adds them to the test suite. See the /tests/TestSuite.hs for how to use the function. The module SubHask.TemplateHaskell.Test contains the actual implementation.

The existing interface is pretty convenient, but I think it should be automated even more. There's a minor limitation in template haskell that currently prevents full automation (see #9699).

Limitations

SubHask is far from production ready. There are roughly three causes of SubHask's limitations:

  1. A lot of the type signatures within SubHask are messier than they need to be due to limitations with GHC's type system. In particular:

    • I wish I could use the forall keyword within constraints (see #2893 and #5927).

    • SubHask uses a lot of type families, some of which are injective. We can't currently take advantage of injectivity, but adding support to GHC is being actively worked on (see #6018).

    • A few of the invariants that are supposed to be maintained in SubHask's hierarchies can't be mechanically enforced because GHC doesn't allow cycles in the class hierarchy (see #10592).

  2. Some of the abstractions aren't quite right yet and will change in the future. I expect that as I write more programs that depend on SubHask, these abstractions will flesh themselves out a bit.

  3. There's a lot of grunt work that I just haven't had time for. For example, the current implementation of the derivative category transformer in SubHask.Category.Trans.Derivative only supports forward mode automatic differentiation. Adding backwards mode support doesn't require any new ideas, just a couple hours of work. There are currently 118 FIXME comments in the source documenting similar limitations. A great, beginner friendly way to contribute to SubHask would be to find one of these limitations that interests you and fix it :)