maciej-bendkowski / boltzmann-brain

Analytic sampler compiler for combinatorial systems
BSD 3-Clause "New" or "Revised" License
30 stars 5 forks source link

Lightweight compiler for transfer matrix specifications #7

Closed maciej-bendkowski closed 5 years ago

maciej-bendkowski commented 6 years ago

The current rational sampler compiler based on algebraic data types offers a somewhat "heavyweight" representation for rational transfer matrix specifications. Instead of representing structures as words over a fixed alphabet, objects are represented as algebraic data types with additional, and sometimes even unnecessary, boilerplate code. For instance, constructors for each edge in the corresponding transition matrix. Though such a representation might be useful for some applications, it does not scale up well as the system grows. For that reason a new compiler scheme (and also bb input specification) is required.

Consider the following illustrative example specification:

{ a : 1, b : 2, c : 3 }.
A -> B (a) | A (b).
B -> A (c).

Here, we propose a new transfer matrix style system consisting of: (1) a finite weighted alphabet {a,b,c}; a is of weight 1, b is of weight 2 whereas c is of weight 3, and (2) a transition matrix represented in a spare list-based format.

The resulting sampler should consist of a single data type defining the alphabet (a single constructor per each letter in the alphabet) and a generator function returning a list (i.e. word) over the considered alphabet. Interruptible sampling and anticipated rejection should be naturally included.

maciej-bendkowski commented 5 years ago

Commit 309b21b4ba2a15a83e3b38da1d2843badceb3094 provides an experimental compiler for rational specifications in a close input format to the example in this thread. Some examples are given in the respective repository folder, for instance:

@generate A
@withIO   y

-- alphabet with custom
-- symbol frequencies.
{
   a : 0.3 (3),   -- note: custom weight 3 (by default, it's the symbol length)
   b : 0.4,
   c
}

A -> B (a)
   | A (b)

B -> A (c)
   | _      -- note: terminal epsilon symbol.