ssmit1986 / DualNumbers

Implementation of dual numbers for automatic differentiation of programs
MIT License
22 stars 5 forks source link

example for einsum notation? #1

Closed yaroslavvb closed 3 years ago

yaroslavvb commented 3 years ago

Can this package be useful to differentiating expressions in Einstein summation notation?

For instance, if X is a matrix valued variable and A,B are some matrices, it would be cool to get the following derivative automatically (from Section 9.10 of magnus Matrix Calculus book)

Screen Shot 2020-09-14 at 12 27 53 PM

Related discussion is here and here

ssmit1986 commented 3 years ago

I'll give it some thought. I've not dealt with Einstein notation too much, so it might take me some time to get a bright idea.

ssmit1986 commented 3 years ago

Hi @yaroslavvb I struggle to understand what this formula means exactly and I don't have access to that book. On the l.h.s. there is a scalar under the D and on the r.h.s. there is also a scaler (by which I mean that all indices are summed over). So what derivative is being taken here, exactly? Is it some sort of directional derivative w.r.t. the matrix X? Can you give a concrete example in Mathematica code that demonstrates this equality for, e.g., 2x2 matrices?

yaroslavvb commented 3 years ago

Sure, here's a way to check equation above using Mathematica's automatic differentiation:

Clear[x];
SeedRandom[1];
n = 3;
X = Array[x, {n, n}];
A = RandomInteger[{-5, 5}, {n, n}];
B = RandomInteger[{-5, 5}, {n, n}];
out = Tr[X.A.X.B]; (* equivalent to einsum Xij Ajk Xkl Bli *)
derivative = 
  Table[D[out, x[i, j]], {i, 1, n}, {j, 1, n}]\[Transpose];
derivative == A.X.B + B.X.A // Reduce (* True *)

There's a copy of the book floating online, ie here. But the basic idea is X with shape n,d has nd variables, means nd derivatives, so it's convenient to reshape those derivatives into same shape as X and call this matrix "the derivative with respect to X". This gives you elegant formulas that mirror the scalar case, ie derivative of X.X' is 2X

Note that turning einsums like AijXjkBkl into matrix products like AXB is in general non-trivial but can be done using Carl Woll's package (example here)

ssmit1986 commented 3 years ago

Ok, that helps a lot. I still don't quite see how that relates to the equation in the image you showed, since the r.h.s. of that equation only has repeated indices so they should all be summed out to my understanding. Wouldn't it be more accurate to write:

D[X[i, j] A[j, k] X[k, l] B[k, i], X[s, r]] == A[r, i] X[i, j] B[j, s] + B[r, i] X[i, j] A[j, s]

for example? (I'm just guessing here)

yaroslavvb commented 3 years ago

Correct, all indices need to be summed. But that's exactly what's done by the Tr expression. Note that the following evaluated to True

Tr[X.A.X.B] == 
  Total@Flatten[
    Table[X[[i, j]] A[[j, k]] X[[k, l]] B[[l, i]], {i, 1, n}, {j, 1, 
      n}, {k, 1, n}, {l, 1, n}]] // Reduce

You can see this from the fact that matrix multiplication XAXB can be written in einsum notation as XijAjkXklBlp. But then when you take trace, the sum becomes XijAjkXklBlp sigma(i==p) where "sigma(a=b)" returns 1 when a=b and 0 otherwise. This means you can rewrite this einsum as XijAjkXklBli which is what we had in the beginning

ssmit1986 commented 3 years ago

Yes, that explains the part on the l.h.s. under the derivative. But the derivative of a scalar (the trace out) w.r.t. a matrix (X) is a matrix (or vector or whatever you want to call it) quantity. So in the picture, I see a matrix quantity on the l.h.s. but a scalar on the r.h.s. That doesn't make sense to me.

In other words, the equation derivative == A.X.B + B.X.A in your code is a matrix equation, but the r.h.s. of the equation in the picture is a scalar. That's what's bothering me.

The reason this is important, is because you can only get single components of a derivative with dual numbers at a time. Or directional derivatives (i.e., something of the form Dot[vector, gradient]. So I need to understand how to get the components that are being asked for or find a way to generate the components.

ssmit1986 commented 3 years ago

Yes, I think I found a way to do it. You can use KroneckerDelta to pick out components from the derivative matrix like so:

<< DualNumbers`
d = X[i, j] A[j, k] X[k, l] B[l, i] /. X[ii_, jj_] :> Dual[X[ii, jj], KroneckerDelta[ii, r]*KroneckerDelta[jj, s]];
NonStandard[d]

A[j, k] B[l, i] KroneckerDelta[k, r] KroneckerDelta[l, s] X[i, j] +A[j, k] B[l, i] KroneckerDelta[i, r] KroneckerDelta[j, s] X[k, l]

The nonstandard part here gives you the element {r, s} of the derivative matrix.

The KroneckerDeltas can easily be summed out. I'm sure Carl Woll's package can do it, but I haven't looked into it. Let's just do it with some replacement rules for now:

A[j, k] B[l, i] KroneckerDelta[k, r] KroneckerDelta[l, s] X[i, j] /. {k -> r, l -> s}
A[j, k] B[l, i] KroneckerDelta[i, r] KroneckerDelta[j, s] X[k, l] /. {i -> r, j -> s}

A[j, r] B[s, i] X[i, j] A[s, k] B[l, r] X[k, l]

Those look like the two terms we're looking for to me.