This is an implementation of autodiff. The goal is to address issues in computing expectations in TraceEnum_ELBO and TraceMarkovEnum_ELBO (#493). As of now it seems to fix nan gradients under eager interpretation in TraceEnum_ELBO.
The algorithm implements equivalents of linearize(), transpose() functions, and is tape-free (#446).
Linearize. Variables that need to be linearized are replaced by primal- tangent tuple JVP(primal, tangent) and then pattern matched to propagate tangents, e.g.:
Out tangent is a linear function of in tangents. JVP is used for (add,mul) semiring and LJVP is used for (logaddexp,add) semiring.
Transpose of a linear function. Transpose is implemented simply by inverting the order of function execution and transposing matrices, in this case swapping more primitive operations .reduce(sum_op, "i") and .expand("i") (broadcasting does this automatically).
This is an implementation of autodiff. The goal is to address issues in computing expectations in
TraceEnum_ELBO
andTraceMarkovEnum_ELBO
(#493). As of now it seems to fixnan
gradients undereager
interpretation inTraceEnum_ELBO
.The algorithm implements equivalents of
linearize()
,transpose()
functions, and is tape-free (#446).JVP(primal, tangent)
and then pattern matched to propagate tangents, e.g.:Out tangent is a linear function of in tangents.
JVP
is used for(add,mul)
semiring andLJVP
is used for(logaddexp,add)
semiring..reduce(sum_op, "i")
and.expand("i")
(broadcasting does this automatically).