Open janosh opened 4 years ago
You can use TF's tf.custom_gradient
functionality for this. See, e.g. here https://groups.google.com/a/tensorflow.org/g/tfprobability/c/Vr2605ZuBEY/m/m1abXSxmBwAJ for how to implement gradient clipping (not that you necessarily should clip gradients for HMC).
@SiegeLordEx Thanks for the quick reply. The Google group you linked seems to be private. At least I get a 404.
Regarding tf.custom_gradient
, the docs state
Args:
f
: functionf(*x)
that returns a tuple(y, grad_fn)
where:
x
is a sequence of Tensor inputs to the function.y
is a Tensor or sequence of Tensor outputs of applying TensorFlow operations inf
tox
.
That last sentence sounds like it might be a problem. In our case y
is just a scalar value which can of course be made into a TF tensor but it's not the result of TF operations. Instead it's the output of a Fortran binary. Does tf.custom_gradient
still help us in that case?
Here's the public link: https://groups.google.com/a/tensorflow.org/d/msg/tfprobability/Vr2605ZuBEY/m1abXSxmBwAJ
@janosh yes, it should be fine.
Here's another example:
def foo_no_grad(x):
y = np.square(x)
return tf.constant(y)
@tf.custom_gradient
def foo_custom_grad(x):
y = np.square(x)
def grad_fn(dy):
grad = 2 * np.array(x)
return grad * dy
return y, grad_fn
def foo_autodiff(x):
y = tf.square(x)
return y
with tf.GradientTape(persistent=True) as tape:
x = tf.constant(2., dtype=tf.float64)
tape.watch(x)
y1 = foo_no_grad(x)**2
y2 = foo_custom_grad(x)**2
y3 = foo_autodiff(x)**2
print(tape.gradient(y1, x)) # => None
print(tape.gradient(y2, x)) # => tf.Tensor(32.0, shape=(), dtype=float32)
print(tape.gradient(y3, x)) # => tf.Tensor(32.0, shape=(), dtype=float32)
Note that both foo_no_grad
and foo_custom_grad
both use numpy to compute the forward computation (and gradient in the latter case), which is normally not differentiable by TensorFlow's autodiff mechanism. foo_custom_grad
, through @tf.custom_gradient
, is differentiable, even though both the forward and backwards passes are implemented in numpy. You can naturally replace numpy with arbitrary Python code.
One small final note. The code above is Eager-only. If you want things to work inside a tf.function
(which you definitely do want to, for speed reasons), you'll need to wrap your non-TF code in tf.py_function
, e.g.:
y2 = tf.py_function(func=foo_custom_grad, inp=[x], Tout=tf.float64)**2
@SiegeLordEx That's great advice! Much appreciated. We tried to implement your approach yesterday and ran into this error:
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object is not callable
We've been trying to find the cause for a while but no luck. Our target_log_prob_fn
looks like this
@tf.custom_gradient
def target_log_prob_fn(*param_vals):
log_likelihood = -penalty(param_vals, param_keys, ref_energies, "_tmp_optimizer")
def grad_fn(*dys):
grad = jacobian(param_vals, param_keys, ref_energies, "_tmp_optimizer")
return dys * grad
return log_likelihood, grad_fn
param_keys, param_values = prepare_data(mols_atoms, start_params)
param_values = [tf.Variable(x) for x in param_values]
target_log_prob_fn = tf.py_function(
target_log_prob_fn, inp=param_values, Tout=tf.float64
)
where the penalty
function is what calls the Fortran binary:
import numpy as np
from tqdm import trange
import mndo
def penalty(param_vals, param_keys, ref_energies, filename):
"""
params: dict of params for different atoms
ref_energies: np.array of ground truth atomic energies
"""
# mndo expects params to be a dict, constructing that here
# because TFP' HMC requires param_list to be a list
params = {key[0]: {} for key in param_keys}
for key, param in zip(param_keys, param_vals):
atom_type, prop = key
params[atom_type][prop] = param
mndo.set_params(params)
preds = mndo.calculate(filename)
pred_energies = np.array([p["energy"] for p in preds])
diff = ref_energies - pred_energies
mse = (diff ** 2).mean()
return mse
jacobian
just calls penalty
:
def jacobian(param_list, *rest, dh=1e-6):
grad = np.zeros_like(param_list)
for i in trange(len(param_list)):
param_list[i] += dh
forward = penalty(param_list, *rest)
param_list[i] -= 2 * dh
backward = penalty(param_list, *rest)
de = forward - backward
grad[i] = de / (2 * dh)
param_list[i] += dh # undo in-place changes to params for next iteration
return grad
The full code is available at https://github.com/janosh/bayes-mndo. If you have any ideas how to troubleshoot this, we'd love to hear them!
I see two errors.
First, this line: return dys * grad
is probably wrong since dys
is a tuple, and grad
is a numpy array. You'll need to adjust it as appropriate to your problem. The return value of grad_fn
should be the vector-Jacobian product dys @ J
.
The actual error you're hitting is that tf.py_function
, despite its name, does not return a (decorated) function. What it does instead is it evaluates the function passed to it. I.e. in target_log_prob_fn = tf.py_function
, target_log_prob_fn
is a Tensor, not a function. What you want to do is:
def real_target_log_prob_fn(*param_vals):
return tf.py_function(
target_log_prob_fn, inp=param_vals, Tout=tf.float64
)
hmc = tfp.mcmc.HamiltonianMontecarlo(real_target_log_prob_fn)
@SiegeLordEx Thanks for your continued help!
The actual error you're hitting is that
tf.py_function
, despite its name, does not return a (decorated) function. What it does instead is it evaluates the function passed to it.
That explains a few things! I was confused by why we needed to specify inp=param_vals
to a decorator. 🤦
We were aware of the potential issue with return dys * grad
and had already tried several variations. And as expected, after resolving
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object is not callable
we got a new error saying
ValueError: ('custom_gradient function expected to return', 37, 'gradients but returned', 1, 'instead.')
That was easily fixed by return list(dys * grad)
. So now the log prob function looks like this:
@tf.custom_gradient
def target_log_prob_fn(*param_vals):
log_likelihood = -penalty(param_vals, param_keys, ref_energies, "_tmp_optimizer")
def grad_fn(*dys):
grad = jacobian(param_vals, param_keys, ref_energies, "_tmp_optimizer")
return list(dys * grad)
return log_likelihood, grad_fn
def real_target_log_prob_fn(*param_vals):
return tf.py_function(target_log_prob_fn, inp=param_vals, Tout=tf.float64)
This will run for two iterations and then throw:
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:AddV2] name: mcmc_sample_chain/trace_scan/while/smart_for_loop/while/dual_averaging_step_size_adaptation___init__/_one_step/add
If we swap out tfp.mcmc.DualAveragingStepSizeAdaptation
for tfp.mcmc.SimpleStepSizeAdaptation
the error disappears, so maybe there's a bug in DualAveragingStepSizeAdaptation
?
Yeah, sounds like a bug. You can work around it by casting the step size argument to tf.float64
when constructing the NUTS kernel.
Also, while testing this further inside a tf.function
context, I noticed that tf.py_function
can lose the output shape, which will trip up sample_chain
. To fix that, you'll want to do something like:
def real_target_log_prob_fn(*param_vals):
res = tf.py_function(target_log_prob_fn, inp=param_vals, Tout=tf.float64)
res.set_shape(param_vals[0].shape[:-1]) # assumes parameter is vector-valued
return res
You can work around it by casting the step size argument to
tf.float64
when constructing the NUTS kernel.
That solved the problem. Would you accept a PR for that?
That solved the problem. Would you accept a PR for that?
I think what's missing is that NUTS should have a _prepare_args
function like the one in HMC which defers converting things to a Tensor until the dtype of the state is known.
@SiegeLordEx For some reason, HMC was unable to accept a single step on our actual problem. So we tried to recreate the functionality on a simpler problem which is to guess the parameters of the Branin-Hoo function from a bunch of samples of it. And it looks like we're running into the same problem there. Since this issue is much easier to replicate, could you take another look and let us know if you spot anything that's off?
Again, the full code is public.
hmc-test.py
from datetime import datetime
import numpy as np
from functools import partial
import tensorflow as tf
from hmc_utils import sample_chain, trace_fn, get_nuts_kernel
from bo_bench import (
branin_hoo_params,
sample_branin_hoo,
branin_hoo_factory,
branin_hoo_fn,
)
import plotly.graph_objects as go
# Plot the Branin-Hoo surface
xr = np.linspace(-5, 15, 21)
yr = np.linspace(0, 10, 11)
domain = np.stack(np.meshgrid(xr, yr), -1).reshape(-1, 2).T
surface = go.Surface(x=xr, y=yr, z=branin_hoo_fn(domain).reshape(len(yr), -1))
fig = go.Figure(data=[surface])
fig.update_layout(height=700, title_text="Branin-Hoo function")
# Generate random data set
xy, z_true = sample_branin_hoo(100)
def penalty(params):
z_pred = branin_hoo_factory(*params)(xy)
# Normally we'd just return -tf.metrics.mse(z_true, z_pred). But to test if
# custom gradients are the reason HMC isn't accepting steps on MNDO, we
# explicitly avoid autodiff.
se = (z_true - z_pred) ** 2
return se.mean()
def jacobian(params, dh=1e-5):
"""
Args:
params: values for each Branin-Hoo param
dh: small value for numerical gradients
"""
grad = np.zeros_like(params)
for i in range(len(params)):
params[i] += dh
forward = penalty(params)
params[i] -= 2 * dh
backward = penalty(params)
de = forward - backward
grad[i] = de / (2 * dh)
params[i] += dh # undo in-place changes to params for next iteration
return grad
@tf.custom_gradient
def custom_grad_target_log_prob_fn(*params):
log_likelihood = -penalty([x.numpy() for x in params])
def grad_fn(*dys):
grad = jacobian([x.numpy() for x in params])
return list(dys * grad)
return log_likelihood, grad_fn
def target_log_prob_fn(params):
res = tf.py_function(custom_grad_target_log_prob_fn, inp=params, Tout=tf.float64)
# Avoid tripping up sample_chain due to loss of output shape in tf.py_function
# when used in a tf.function context. https://tinyurl.com/y9ttqdpt
res.set_shape(params[0].shape[:-1]) # assumes parameter is vector-valued
return res
# With this function it works. With the above target_log_prob_fn, we can't accept steps.
# def target_log_prob_fn(param_vals):
# z_pred = branin_hoo_factory(*param_vals)(xy)
# return -tf.metrics.mse(z_true, z_pred)
now = datetime.now().strftime("%Y.%m.%d-%H:%M:%S")
log_dir = f"runs/hmc-trace/{now}"
summary_writer = tf.summary.create_file_writer(log_dir)
# Casting step_size and init_state needed due to TFP bug
# https://github.com/tensorflow/probability/issues/904#issuecomment-624272845
step_size = tf.cast(1e-1, tf.float64)
init_state = [v * 1.5 for v in branin_hoo_params.values()]
n_adapt_steps = 20
chain, trace, final_kernel_results = sample_chain(
num_results=40,
current_state=tf.constant(init_state, tf.float64),
kernel=get_nuts_kernel(target_log_prob_fn, step_size, n_adapt_steps),
return_final_kernel_results=True,
trace_fn=partial(trace_fn, summary_writer=summary_writer),
)
burnin, samples = chain[:n_adapt_steps], chain[n_adapt_steps:]
plot_funcs = [
[branin_hoo_fn, "Electric"],
[branin_hoo_factory(*init_state), "Viridis"], # default colorscale
[branin_hoo_factory(*chain[-1].numpy()), "Blues"],
]
surfaces = [
go.Surface(
x=xr, y=yr, z=fn(domain).reshape(len(yr), -1), colorscale=cs, showscale=False
)
for fn, cs in plot_funcs
]
samples_plot = go.Scatter3d(x=xy[0], y=xy[1], z=z_true, mode="markers")
fig = go.Figure(data=[*surfaces, samples_plot])
title = "Branin-Hoo (bottom), initial surface (top), HMC final surface (middle)"
fig.update_layout(height=700, title_text=title)
bo_bench.py
import numpy as np
def branin_hoo_factory(a, b, c, r, s, t):
def branin_hoo(x):
# f(x) = a(y - b*x^2 + c*x - r)^2 + s (1 - t) cos(x) + s
return (
a * (x[1] - b * x[0] ** 2 + c * x[0] - r) ** 2
+ s * (1 - t) * np.cos(x[0])
+ s
)
return branin_hoo
branin_hoo_params = dict(
a=1, b=5.1 / (4 * np.pi ** 2), c=5 / np.pi, r=6, s=10, t=1 / (8 * np.pi)
)
def branin_hoo_fn(x):
"""The Branin-Hoo function is a popular benchmark for Bayesian optimization.
"""
z = branin_hoo_factory(**branin_hoo_params)(x)
return z
def sample_branin_hoo(n_samples, domain=[[-5, 15], [0, 10]]):
"""Take samples from the Branin-Hoo function.
Args:
n_samples (int): number of samples to draw
Returns:
np.array: 2d array of x, y z points
np.array: 1d array of z points
"""
[x_min, x_max], [y_min, y_max] = domain
xy = np.random.uniform(
low=[x_min, y_min], high=[x_max, y_max], size=(n_samples, 2)
).T
z = branin_hoo_fn(xy)
return xy, z
Seems the step size is too large and the num_adapt_step is not enough to tune it yet, for example, setting the step size to smaller value seems to generate proposal the the HMC kernel will accept:
step_size = tf.cast(1e-5, tf.float64)
init_state = tf.constant([v * 1.5 for v in branin_hoo_params.values()], tf.float64)
n_adapt_steps = 20
kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=target_log_prob_fn,
step_size=step_size),
num_adaptation_steps=n_adapt_steps
)
pkr = kernel.bootstrap_results(init_state)
next_state, next_pkr = kernel.one_step(init_state, pkr)
next_pkr.inner_results.is_accepted
@junpenglao Sorry, I was doing a sweep over the step size before posting here and forgot to revert the value. step_size
and n_adapt_steps
are usually set to this:
step_size = tf.cast(1e-3, tf.float64)
init_state = [v * 1.5 for v in branin_hoo_params.values()]
n_adapt_steps = 200
Just as a visual confirmation that HMC isn't moving, here's the target surface (bottom), initial surface (top) and surface parametrized by the final sample in the chain (in a blue white color scale). The initial and final surface coincide.
If I don't use tf.custom_gradient
and tf.py_function
and simply rely on autodiff, the final surface coincides with the true Branin-Hoo surface instead. So there must be something wrong in how I use those functions or how they interact with TFP's mcmc
module.
I think the gradient computation is not quite right - it should be the output of jacobian
directly.
See this reproducible colab: https://colab.research.google.com/drive/10npUrXjLuZLJMt2229U7lF0B34LBbFqj?usp=sharing
In general, if you're concerned about the correctness of your analytical gradients, you can check them numerically. TensorFlow has a utility for that: https://www.tensorflow.org/api_docs/python/tf/test/compute_gradient
When passing a
target_log_prob_fn
that's not built from TF primitives (and hence doesn't allow for automatic differentiation) to theHamiltonianMonteCarlo
kernel, is there a way to also pass a custom gradient function? Of course, you lose all the performance benefits of autodif, but this would significantly increase the potential areas of application for TFP's HMC.