wesselb / varz

Painless optimisation of constrained variables in AutoGrad, TensorFlow, PyTorch, and JAX
MIT License
23 stars 3 forks source link

Varz

CI Coverage Status Latest Docs Code style: black

Painless optimisation of constrained variables in AutoGrad, TensorFlow, PyTorch, and JAX

Requirements and Installation

See the instructions here. Then simply

pip install varz

Manual

Basics

from varz import Vars

To begin with, create a variable container of the right data type. For use with AutoGrad, use a np.* data type; for use with PyTorch, use a torch.* data type; for use with TensorFlow, use a tf.* data type; and for use with JAX, use a jnp.* data type. In this example we'll use AutoGrad.

>>> vs = Vars(np.float64)

Now a variable can be created by requesting it, giving it an initial value and a name.

>>> vs.unbounded(np.random.randn(2, 2), name="x")
array([[ 1.04404354, -1.98478763],
       [ 1.14176728, -3.2915562 ]])

If the same variable is created again, because a variable with the name x already exists, the existing variable will be returned, even if you again pass it an initial value.

>>> vs.unbounded(np.random.randn(2, 2), name="x")
array([[ 1.04404354, -1.98478763],
       [ 1.14176728, -3.2915562 ]])

>>> vs.unbounded(name="x")
array([[ 1.04404354, -1.98478763],
       [ 1.14176728, -3.2915562 ]])

Alternatively, indexing syntax may be used to get the existing variable x. This asserts that a variable with the name x already exists and will throw a KeyError otherwise.

>>> vs["x"]
array([[ 1.04404354, -1.98478763],
       [ 1.14176728, -3.2915562 ]])

>>> vs["y"]
KeyError: 'y'

The value of x can be changed by assigning it a different value.

>>> vs.assign("x", np.random.randn(2, 2))
array([[ 1.43477728,  0.51006941],
       [-0.74686452, -1.05285767]])

By default, assignment is non-differentiable and overwrites data. The variable can be deleted by passing its name to vs.delete:

>>> vs.delete("x")

>>> vs["x"]
KeyError: 'x'

When a variable is first created, you can set the keyword argument visible to False if you want to make the variable invisible to the variable-aggregating operations vs.get_latent_vars and vs.get_latent_vector. These variable-aggregating operations are used in optimisers to get the intended collection of variable to optimise. Therefore, setting visible to False will prevent a variable from being optimised.

Finally, a variable container can be copied with vs.copy(). Copies are lightweight and share their variables with the originals. As a consequence, however, assignment in a copy will also mutate the original. Differentiable assignment, however, will not.

Naming

Variables may be organised by naming them hierarchically using .s. For example, you could name like group1.bar, group1.foo, and group2.bar. This is helpful for extracting collections of variables, where wildcards may be used to match names. For example, *.bar would match group1.bar and group2.bar, and group1.* would match group1.bar and group1.foo. See also here.

The names of all variables can be obtained with Vars.names, and variables can be printed with Vars.print.

Example:

>>> vs = Vars(np.float64)

>>> vs.unbounded(1, name="x1")
array(1.)

>>> vs.unbounded(2, name="x2")
array(2.)

>>> vs.unbounded(3, name="y")
array(3.)

>>> vs.names
['x1', 'x2', 'y']

>>> vs.print()
x1:         1.0
x2:         2.0
y:          3.0

Constrained Variables

These constrained variables are created by transforming some latent unconstrained representation to the desired constrained space. The latent variables can be obtained using Vars.get_latent_vars.

>>> vs.get_latent_vars("positive_variable", "bounded_variable")
[array(-4.07892742), array(-0.604883)]

To illustrate the use of wildcards, the following is equivalent:

>>> vs.get_latent_vars("*_variable")
[array(-4.07892742), array(-0.604883)]

Variables can be excluded by prepending a dash:

>>> vs.get_latent_vars("*_variable", "-bounded_*")
[array(-4.07892742)]

Automatic Naming of Variables

To parametrise functions, a common pattern is the following:

def objective(vs):
    x = vs.unbounded(5, name="x")
    y = vs.unbounded(10, name="y")

    return (x * y - 5) ** 2 + x ** 2

The names for x and y are necessary, because otherwise new variables will be created and initialised every time objective is run. Varz offers two ways to not having to specify a name for every variable: sequential and parametrised specification.

Sequential Specification

Sequential specification can be used if, upon execution of objective, variables are always obtained in the same order. This means that variables can be identified with their position in this order and hence be named accordingly. To use sequential specification, decorate the function with sequential.

Example:

from varz import sequential

@sequential
def objective(vs):
    x = vs.unbounded(5)  # Initialise to 5.
    y = vs.unbounded()   # Initialise randomly.

    return (x * y - 5) ** 2 + x ** 2
>>> vs = Vars(np.float64)

>>> objective(vs)
68.65047879833773

>>> objective(vs)  # Running the objective again reuses the same variables.
68.65047879833773

>>> vs.names
['var0', 'var1']

>>> vs.print()
var0:       5.0      # This is `x`.
var1:       -0.3214  # This is `y`.

Parametrised Specification

Sequential specification still suffers from boilerplate code like x = vs.unbounded(5) and y = vs.unbounded(). This is the problem that parametrised specification addresses, which allows you to specify variables as arguments to your function. Import from varz.spec import parametrised. To indicate that an argument of the function is a variable, as opposed to a regular argument, the argument's type hint must be set accordingly, as follows:

As can be seen from the above, the variable container must also be an argument of the function, because that is where the variables will be obtained from. A variable can be given an initial value in the way you would expect:

@parametrised
def f(vs, x: Unbounded = 5):
    ...

Variable arguments and regular arguments can be mixed. If f is called, variable arguments must not be specified, because they will be obtained automatically. Regular arguments, however, must be specified.

To use parametrised specification, decorate the function with parametrised.

Example:

from varz import parametrised, Unbounded, Bounded

@parametrised
def objective(vs, x: Unbounded, y: Bounded(lower=1, upper=3) = 2, option=None):
    print("Option:", option)
    return (x * y - 5) ** 2 + x ** 2
>>> vs = Vars(np.float64)

>>> objective(vs)
Option: None
9.757481795615316

>>> objective(vs, "other")
Option: other
9.757481795615316

>>> objective(vs, option="other")
Option: other
9.757481795615316

>>> objective(vs, x=5)  # This is not valid, because `x` will be obtained automatically from `vs`.
ValueError: 1 keyword argument(s) not parsed: x.

>>> vs.print()
x:          1.025
y:          2.0

Namespaces

Namespaces can be used to group all variables in a function together.

Example:

from varz import namespace

@namespace("test")
def objective(vs):
    x = vs.unbounded(5, name="x")
    y = vs.unbounded(name="y")

    return x + y
>>> vs = Vars(np.float64)

>>> objective(vs)
6.12448906632577

>>> vs.names
['test.x', 'test.y']

>>> vs.print()
test.x:     5.0
test.y:     1.124

You can combine namespace with other specification methods:

from varz import namespace

@namespace("test")
@sequential
def objective(vs):
    x = vs.unbounded(5)
    y = vs.unbounded()

    return x + y
>>> vs = Vars(np.float64)

>>> objective(vs)
4.812730329303665

>>> vs.names
['test.var0', 'test.var1']

>>> vs.print()
test.var0:  5.0
test.var1:  -0.1873

Structlike Specification

For any variable container vs, vs.struct gives an object which you can treat like nested struct, list, or dictionary to automatically generate variable names. For example, vs.struct.model["a"].variance.positive() would be equivalent to vs.positive(name="model[a].variance"). After variables have been defined in this way, they also be extracted via vs.struct: vs.struct.model["a"].variance() would be equivalent to vs["model[a].variance"].

Example:

def objective(vs):
    params = vs.struct

    x = params.x.unbounded()
    y = params.y.unbounded()

    for model_params, model in zip(params.models, [object(), object(), object()]):
        model_params.specific_parameter1.positive()
        model_params.specific_parameter2.positive()

    return x + y
>>> vs = Vars(np.float64)

>>> objective(vs)
-0.08322955725015702

>>> vs.names
['x',
 'y',
 'models[0].specific_parameter1',
 'models[0].specific_parameter2',
 'models[1].specific_parameter1',
 'models[1].specific_parameter2',
 'models[2].specific_parameter1',
 'models[2].specific_parameter2']

>>> vs.print()
x:          -0.8963
y:          0.8131
models[0].specific_parameter1: 0.01855
models[0].specific_parameter2: 0.6644
models[1].specific_parameter1: 0.3542
models[1].specific_parameter2: 0.3642
models[2].specific_parameter1: 0.5807
models[2].specific_parameter2: 0.5977

>>> vs.struct.models[0].specific_parameter1()
0.018551827512328086

>>> vs.struct.models[0].specific_parameter2()
0.6643533007198247

There are a few methods available for convenient manipulation of the variable struct. In the following, let params = vs.struct.

Optimisers

The following optimisers are available:

varz.{autograd,tensorflow,torch,jax}.minimise_l_bfgs_b (L-BFGS-B)
varz.{autograd,tensorflow,torch,jax}.minimise_adam     (ADAM)

The L-BFGS-B algorithm is recommended for deterministic objectives and ADAM is recommended for stochastic objectives.

See the examples for an illustration of how these optimisers can be used. Some commonly used keyword arguments are as follows:

Keyword Argument Description
iters Number of iterations
trace Show progress
jit Use a JIT to compile the gradient

See the API for a detailed description of the keyword arguments that these optimisers accept.

PyTorch Specifics

All the variables held by a container can be detached from the current computation graph with Vars.detach . To make a copy of the container with detached versions of the variables, use Vars.copy with detach=True instead. Whether variables require gradients can be configured with Vars.requires_grad. By default, no variable requires a gradient.

Getting and Setting Latent Representations of Variables as a Vector

It may be desirable to get the latent representations of a collection of variables as a single vector, e.g. when feeding them to an optimiser. This can be achieved with Vars.get_latent_vector.

>>> vs.get_latent_vector("x", "*_variable")
array([0.12500578, -0.21510423, -0.61336039, 1.23074066, -4.07892742,
       -0.604883])

Similarly, to update the latent representation of a collection of variables, Vars.set_latent_vector can be used.

>>> vs.set_latent_vector(np.ones(6), "x", "*_variable")
[array([[1., 1.],
        [1., 1.]]), array(1.), array(1.)]

>>> vs.get_latent_vector("x", "*_variable")
array([1., 1., 1., 1., 1., 1.])

Differentiable Assignment

By default, Vars.set_latent_vector will overwrite the variables, just like Vars.assign. This has as an unfortunate consequence that you cannot differentiate with respect to the assigned values. To be able to differentiable with respect to the assigned values, set the keyword differentiable=True in the call to Vars.set_latent_vector. Unlike regular assignment, if the variable container is a copy of some original, differentiable assignment will not mutate the variables in the original.

Get Variables from a Source

The keyword argument source can set to a tensor from which the latent variables will be obtained.

Example:

>>> vs = Vars(np.float32, source=np.array([1, 2, 3, 4, 5]))

>>> vs.unbounded()
array(1., dtype=float32)

>>> vs.unbounded(shape=(3,))
array([2., 3., 4.], dtype=float32)

>>> vs.pos()
148.41316

>>> np.exp(5).astype(np.float32)
148.41316

GPU Support

To create and optimise variables on a GPU, set the active device to a GPU. The easiest way of doing this is to import lab as B and B.set_global_device("gpu:0").

Examples

Minimise a Function Using L-BFGS-B in AutoGrad

import autograd.numpy as np
from varz.autograd import Vars, minimise_l_bfgs_b

target = 5.0 

def objective(vs):
    # Get a variable named "x", which must be positive, initialised to 10.
    x = vs.pos(10.0, name="x")  
    return (x ** 0.5 - target) ** 2  
>>> vs = Vars(np.float64)

>>> minimise_l_bfgs_b(objective, vs)
3.17785950743424e-19  # Final objective function value.

>>> vs['x'] - target ** 2
-5.637250666268301e-09

Minimise a Function Using L-BFGS-B in TensorFlow

import tensorflow as tf
from varz.tensorflow import Vars, minimise_l_bfgs_b

target = 5.0

def objective(vs):
    # Get a variable named "x", which must be positive, initialised to 10.
    x = vs.pos(10.0, name="x")  
    return (x ** 0.5 - target) ** 2  
>>> vs = Vars(tf.float64)

>>> minimise_l_bfgs_b(objective, vs)
3.17785950743424e-19  # Final objective function value.

>>> vs['x'] - target ** 2
<tf.Tensor: id=562, shape=(), dtype=float64, numpy=-5.637250666268301e-09>

>>> vs = Vars(tf.float64)

>>> minimise_l_bfgs_b(objective, vs, jit=True)  # Speed up optimisation with TF's JIT!
3.17785950743424e-19

Minimise a Function Using L-BFGS-B in PyTorch

import torch
from varz.torch import Vars, minimise_l_bfgs_b

target = torch.tensor(5.0, dtype=torch.float64)

def objective(vs):
    # Get a variable named "x", which must be positive, initialised to 10.
    x = vs.pos(10.0, name="x")  
    return (x ** 0.5 - target) ** 2  
>>> vs = Vars(torch.float64)

>>> minimise_l_bfgs_b(objective, vs)
array(3.17785951e-19)  # Final objective function value.

>>> vs["x"] - target ** 2
tensor(-5.6373e-09, dtype=torch.float64)

>>> vs = Vars(torch.float64)

>>> minimise_l_bfgs_b(objective, vs, jit=True)  # Speed up optimisation with PyTorch's JIT!
array(3.17785951e-19)

Minimise a Function Using L-BFGS-B in JAX

import jax.numpy as jnp
from varz.jax import Vars, minimise_l_bfgs_b

target = 5.0

def objective(vs):
    # Get a variable named "x", which must be positive, initialised to 10.
    x = vs.pos(10.0, name="x")  
    return (x ** 0.5 - target) ** 2  
>>> vs = Vars(jnp.float64)

>>> minimise_l_bfgs_b(objective, vs)
array(3.17785951e-19)  # Final objective function value.

>>> vs["x"] - target ** 2
-5.637250666268301e-09

>>> vs = Vars(jnp.float64)

>>> minimise_l_bfgs_b(objective, vs, jit=True)  # Speed up optimisation with Jax's JIT!
array(3.17785951e-19)

Tracking the Learning Curve in JAX

import jax.numpy as jnp
from varz.jax import Vars, minimise_l_bfgs_b

target = 5.0

def objective(vs, prev_x):
    # Get a variable named "x", which must be positive, initialised to 10.
    x = vs.pos(10.0, name="x")
    # In addition to the objective function value, also return `x` so that  
    # we can log it.
    return (x ** 0.5 - target) ** 2, x  

objs = []
xs = []

def callback(obj, x):
    objs.append(obj)
    xs.append(x)
    # Return a dictionary of extra information to show in the progress display.
    return {"x": x}
>>> vs = Vars(jnp.float64)

>>> minimise_l_bfgs_b(objective, (vs, 0), trace=True, jit=True, callback=callback)
Minimisation of "objective":
    Iteration 1/1000:
        Time elapsed: 0.0 s
        Time left:  19.0 s
        Objective value: 0.04567
        x:          27.18
    Iteration 6/1000:
        Time elapsed: 0.1 s
        Time left:  7.4 s
        Objective value: 4.520e-04
        x:          24.99
    Done!
Termination message:
    CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
(array(3.17785951e-19), DeviceArray(24.99999999, dtype=float64))

>>> vs["x"] - target ** 2
DeviceArray(-5.63725067e-09, dtype=float64)

>>> objs
[array(3.3772234),
 array(0.04567386),
 array(0.03582296),
 array(0.00014534),
 array(5.18203996e-07),
 array(6.81622668e-12),
 array(3.17785951e-19)]

>>> xs
[DeviceArray(10., dtype=float64),
 DeviceArray(27.18281828, dtype=float64),
 DeviceArray(23.14312757, dtype=float64),
 DeviceArray(24.87958747, dtype=float64),
 DeviceArray(25.00719916, dtype=float64),
 DeviceArray(24.99997389, dtype=float64),
 DeviceArray(24.99999999, dtype=float64)]