Open gregreen opened 4 years ago
I just finished reading this paper. Isnt't that taking norm of two output of the ode_solver then added to the log_det_jacobian? Please correct me if I'm making mistakes😂
Almost. During training, I think you can add those two terms to the log_det_jacobian
. However, after training, when evaluating the log probability, you don't want to add those additional terms in.
I've been trying to understand the internal mechanics of the TFP implementation of FFJORD. I think that augmented_ode_fn
needs to be updated to calculate the additional regularization terms (\dot{E}_j and \dot{n}_j). They would then be cached, just like is done with the log_det_jacobian
. In particular, this line would be changed to something like the following:
y, fldj, Edot, ndot = self._solve_ode(augmented_ode_fn, augmented_x)
I'm willing to take a shot at implementing this regularization, but it might be easier for people who already understand the internal workings of the FFJORD implementation.
How about passing is_training
to_forward
/ _inverse
as parameter?
def _forward(self, x, is_training=True):
y, _ = self._augmented_forward(x, is_training=is_training)
return y
def _augmented_forward(self, y, is_training=True):
...
if is_training:
return y, fldj + Edot + ndot
else:
return y, fldj
# and cached:
...
cached = self._cache.forward_attributes(x, is_training=is_training)
...
It caches with input x
and is_training
. If all we need is square y
and fldj
/ildj
in y, fldj = self._solve_ode(augmented_ode_fn, augmented_x)
, we don't need to modify augmented_ode_fn
in this way.
If you want to pass a kwarg to bijector.forward, you can pass it via
transformed_dist.log_prob(x, bijector_kwargs=dict(is_training=..))
. Is
that helpful? We don't have improvements to ffjord currently on our
roadmap, so we'd be happy to look at a PR adding an option to turn this on,
assuming this is a generally-useful improvement.
On Tue, Oct 6, 2020 at 10:47 AM gitlabspy notifications@github.com wrote:
How about passing is_training to _forward/ _inverse as parameter?
def _forward(self, x, istraining=True): y, = self._augmented_forward(x, is_training=is_training) return y def _augmented_forward(self, y, is_training=True): ... if is_training: return y, fldj + Edot + ndot else: return y, fldj
and cached:
... cached = self._cache.forward_attributes(x, is_training=is_training) ...
It caches when both input x and is_training are same. If all we need is square y and fldj/ildj in y, fldj = self._solve_ode(augmented_ode_fn, augmented_x) , we don't need to modify augmented_ode_fn in this way.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1095#issuecomment-704320528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSIZI34KWRGGL3NKBO5LSJMUZDANCNFSM4R7RFB6Q .
We'll need a few hyperparameters as well (how much to weight Edot
and ndot
), which can either be passed to _forward
and cached or saved as attributes of FFJORD
. Thoughts on which is better? I can try to implement this addition.
If we take the presence of one or both of those hparams to indicate whether or not is_training=True, then it seems like log_prob (equivalently, _forward) could just take them as kwargs. Let's give them readable names, though. Edot and ndot don't say anything (to me) about what they are regularizing.
On Tue, Oct 6, 2020 at 12:42 PM Gregory Green notifications@github.com wrote:
We'll need a few hyperparameters as well (how much to weight Edot and ndot), which can either be passed to _forward and cached or saved as attributes of FFJORD. Thoughts on which is better? I can try to implement this addition.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1095#issuecomment-704405003, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI2VLL6JUJFYY7OMCCLSJNCFVANCNFSM4R7RFB6Q .
BTW: _forward could take those as kwargs, in lieu of a is_training kwarg.
Edot and ndot stand for kinetic penalty and jacobian penalty respectively.
def _augmented_forward(self, y, is_training=True):
...
if is_training:
return y, fldj + Edot * self.kinetic_penalty + ndot * self. jacobian_penalty
I think weight of kinetic_penalty and jacobian_penalty could be set as hparams of ffjord
When no regularizing needed, kinetic_penalty
and jacobian_penalty
are set as default 0
, and the is_training
should be treated as kwargs for forward
.
ffjord=FFJORD(odefunc, kinetic_penalty=0.01, jacobian_penalty=0.01)
z=ffjord.forward(x, is_training=True)
BTW: _forward could take those as kwargs, in lieu of a is_training kwarg.
I realize your suggestion is a more elegant way of implementation than mine above. Just put kinetic_penalty and jacobian_penalty in forward’s kwargs, default 0.
I've realized that it's possible to impose the regularization by providing a modified trace_augmentation_fn
, without actually altering the internals of the FFJORD
class.
For example, the kinetic regularization term can be added with a wrapper around either of the trace augmentation functions provided by ffjord.py:
def add_kinetic_regularization(trace_augmentation_fn, kinetic_penalty=0.):
def get_aug_ode_fn(ode_fn, state_shape, dtype):
augmented_ode_fn = trace_augmentation_fn(ode_fn, state_shape, dtype)
def reg_augmented_ode_fn(time, state_log_det_jac):
state_time_derivative, trace_value = augmented_ode_fn(time, state_log_det_jac)
kinetic_reg = kinetic_penalty * state_time_derivative**2
return state_time_derivative, trace_value - kinetic_reg
return reg_augmented_ode_fn
return get_aug_ode_fn
trace_augmentation_fn = add_kinetic_regularization(
tfb.ffjord.trace_jacobian_exact, kinetic_penalty=0.05)
I don't think the Jacobian penalty can be added with a simple wrapper like this, because the Frobenius norm of ∇f (see @gitlabspy's above comment) is not computable from trace_value
(which only contains the diagonal components of ∇f). To implement the Jacobian penalty, one actually has to alter the internals of the individual trace augmentation functions. It shouldn't be too difficult to do so, though.
Isn't that Ldot(second term of ffjord circled by blue box in the image I uploaded above) caculated by the Hutchinson esitimator? I notice that you use trace_jacobian_exact
which is not caculating the Ldot term.
From original paper:
Thus Jacobian Frobeniusnorm regularization is available with essentially no extra computational cost.
If I am not making mistake, we can use this term to calculate Frobenius norm. https://github.com/tensorflow/probability/blob/cfeae22d71766041d2b4108f5b7675e9e7175e34/tensorflow_probability/python/bijectors/ffjord.py#L97
Ldot can be calculated using an intermediate variable used in Hutchinson estimator, but it needs to be calculated within the function trace_jacobian_hutchinson
. I think that Ldot is the norm of the vector jvp
:
One can also modify trace_jacobian_exact
to calculate Ldot.
It would be great to implement regularization techniques that have been recently been developed for FFJORD. For example,
The simplest regularization adds a penalty for the path length of each sample's trajectory. I don't know how to begin to implement this in Tensorflow Probability, as I don't understand the internals well enough. During training time, when calculating
log_prob(batch)
, it's necessary to calculate the integral of |f(t,x)| over the path taken by each sample in the batch, and to somehow expose this information so that it can be used in an additional penalty term.