Open Upabjojr opened 5 years ago
I think that this can be implemented. Numpy also supports broadcasting. For reference: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
This kind of special case can often lead to awkwardness down the line. Why is it advantageous? The user can just use a scalar instead of a 1x1 matrix.
This is what happens in Matlab and it makes sense there. But that's because in Malab there is no way to distinguish between a scalar and a 1x1 matrix. Why would it be useful in SymPy?
In making machine learning and deep learning models, broadcasting is especially beneficial. The cases where it is used mostly are somewhat like w*x + b
where w
is nxn
weight matrix and a
and b
are 1*n
vectors.
This kind of special case can often lead to awkwardness down the line. Why is it advantageous? The user can just use a scalar instead of a 1x1 matrix.
This is what happens in Matlab and it makes sense there. But that's because in Malab there is no way to distinguish between a scalar and a 1x1 matrix. Why would it be useful in SymPy?
Well, actually not. The 1x1 matrix may be an expression which cannot be simplified leading to a 1x1 shape.
I was actually coming along this problem while pondering about the support for matrix derivatives.
In making machine learning and deep learning models, broadcasting is especially beneficial. The cases where it is used mostly are somewhat like
w*x + b
wherew
isnxn
weight matrix anda
andb
are1*n
vectors.
We are never going to support broadcasting. The (1, 1) matrix shape is a special case as it is equivalent to a scalar.
KroneckerProduct
may be used to achieve this behaviour with (1, 1)-shaped matrices.
Well, actually not. The 1x1 matrix may be an expression which cannot be simplified leading to a 1x1 shape.
Can you give an example?
This is effectively a duplicate of https://github.com/sympy/sympy/issues/9190. I don't think we should allow this. Type confusion imposes a tradeoff. On the one hand, you get more convenient syntax (you don't have to add (...)[0]
to 1x1 matrices). On the other hand, you get reduced assurances from type checking.
For SymPy, I generally prefer to avoid such type confusion, as it's not mathematically valid. One must also consider from a practical point of view. Should a 1x1 matrix be allowed in any scalar context? If not, it will feel inconsistent to users. If so, does it need to be a separate type from other matrices? What is the rule for when an operation on a 1x1 matrix and an actual scalar should produce a matrix vs. a scalar (what should x + Matrix([y])
be?). If some operation produces a scalar, then matrix operations won't work any more, so should every scalar be a 1x1 matrix?
NumPy answers yes to the last question, which is why this works. It does this via broadcasting.
I think both @Upabjojr and @asmeurer are right.
Matrix multiplication shouldn't be allowed to go against it's definition, since it will effect the integrity of the matrices implemented in sympy
. Moreover, we can use KroneckerProduct
since, it allows multiplication of arbitrary ordered matrices. Is it implemented in sympy
? If not can we work on this, @Upabjojr ?
Can you give an example?
A = MatrixSymbol("A", 1, 3)
B = MatrixSymbol("B", 3, 1)
A*B # <== your example
Moreover, we can use
KroneckerProduct
since, it allows multiplication of arbitrary ordered matrices. Is it implemented insympy
?
Yes, it is.
In your example I can extract the scalar if I know that it is 1x1:
In [5]: (A*B)[0,0]
Out[5]: A₀₀⋅B₀₀ + A₀₁⋅B₁₀ + A₀₂⋅B₂₀
If I didn't know that the result was going to be 1x1 then I wouldn't attempt to do (A*B)*C
unless I thought that the shapes were compatible under the rules of normal matrix multiplication.
I don't see an example where it is useful to special case 1x1 matrices in normal matrix multiplication. In general we would lose the invariant that (A*B).shape == (A.shape[0], B.shape[1])
.
I was actually noticing that people who implemented this website allow unity shape to multiply everything.
Maybe we should not follow their example.
Are there matrix derivative formulas for MatrixElement?
Are there matrix derivative formulas for MatrixElement?
Do you mean the Kronecker delta?
I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?
I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?
MatrixElement
is treated as a symbol independent to MatrixSymbol
:
In [2]: A = MatrixSymbol("A", 1, 1)
In [3]: A[0, 0].diff(A)
Out[3]: 0
Do you want to change this behavior?
I thought on this problem more. I have a question, that is there any possibility of applying assumptions to matrix to consider it as a scalar, whenever, it attains 1X1 order, or may be a global flag for matrices using which it can be determined that when to raise an exception and when to just use normal multiplication for 1X1 matrices? Using flags or additional parameters can mess up the API though. Any views on this @ everybody?
applying assumptions to matrix to consider it as a scalar
No, please. Assumptions are already a mess, let's avoid making them even worse.
I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?
MatrixElement
is treated as a symbol independent toMatrixSymbol
:In [2]: A = MatrixSymbol("A", 1, 1) In [3]: A[0, 0].diff(A) Out[3]: 0
Do you want to change this behavior?
@Upabjojr Well, in my recent PR #17232, I've actually attempted to change the behavior I'm not sure in which context it should be considered zero matrix, but it looks like it is inconsistent with other behaviors as #17229, and also the matrix derivative section in the matrix cookbook.
What is your opinion about this?
One other thing to note about is that, python summation seems like incrementing zero first time, so matrix summation would give error if not specified as below
from sympy import *
m = Matrix([[1, 2], [3, 4]])
sum((i*m for i in range(1, 11)), ZeroMatrix(2, 2))
Numpy or such may not have such problems because it would broadcast zeros. But I would see that SymPy should be strict about math in general. So in other way, this thing may have to be documented as some gotcha.
In my opinion, the following behavior of block matrices is related to this issue and should also be regarded.
Consider two block matrices consisting of the blocks A 3x3 and B 3x1:
The result of this is C 3x1
>>> A = MatrixSymbol('A',3,3)
>>> B = MatrixSymbol('B',3,1)
>>> C = Matrix([[A, A]])*Matrix([[B],[B]])
>>> C
Matrix([[2*A*B]])
>>> C.shape
(1, 1)
Is this result intended? Abolishing 1x1 matrices would lead to the expected result.
>>> C[0,0].shape
(3, 1)
In my opinion this is related to the unexpected behavior discussed on this page:
>>> a = Symbol('a')
>>> b = Symbol('b')
>>> c = Matrix([[a, a]])*Matrix([[b],[b]])
>>> c.is_Matrix
True
I don’t think that you should use Matrix, but BlockMatrix as a container object for symbolic matrix blocks.
Thank you for pointing that out. Maybe I don't quite understand the point of having so many classes. For me as a user, I still see the connection to the 1x1 matrices.
Nevertheless, you are right: A way that leads to the expected result, is
>>> D = block_collapse(BlockMatrix([[A,A]])*BlockMatrix([[B],[B]]))
>>> D
2*A*B
>>> D.shape
(3, 1)
Perhaps Matrix([[matrices...]])
should give an error message explaining to use BlockMatrix
.
Unfortunately, I don’t think that it’s easy to disentangle block matrix from matrices because BlockMatrix actually uses Matrix as its own container
Perhaps instead Matrix([[matrices]])
could automatically convert to BlockMatrix
then. Actually I can't reproduce the above problem with 1.6 or master:
In [5]: A = MatrixSymbol('A', 3, 3)
In [6]: B = MatrixSymbol('B', 3, 1)
In [7]: Matrix([[A, A]])
Out[7]:
⎡A₀₀ A₀₁ A₀₂ A₀₀ A₀₁ A₀₂⎤
⎢ ⎥
⎢A₁₀ A₁₁ A₁₂ A₁₀ A₁₁ A₁₂⎥
⎢ ⎥
⎣A₂₀ A₂₁ A₂₂ A₂₀ A₂₁ A₂₂⎦
In [8]: C = Matrix([[A, A]])*Matrix([[B],[B]])
In [9]: C
Out[9]:
⎡2⋅A₀₀⋅B₀₀ + 2⋅A₀₁⋅B₁₀ + 2⋅A₀₂⋅B₂₀⎤
⎢ ⎥
⎢2⋅A₁₀⋅B₀₀ + 2⋅A₁₁⋅B₁₀ + 2⋅A₁₂⋅B₂₀⎥
⎢ ⎥
⎣2⋅A₂₀⋅B₀₀ + 2⋅A₂₁⋅B₁₀ + 2⋅A₂₂⋅B₂₀⎦
In [10]: C.shape
Out[10]: (3, 1)
I am very sorry to have stolen your time. (A greenhorn learned a lesson.) Maybe this aspect can nevertheless contribute to the discussion on the expected behavior of 1x1 matrices.
After some version, it had likely been changed to expand all the block matrices.
However, if you use evaluate=False
you may still see the problematic behavior.
The behaviour of broadcasting has been (partially) solved by the introduction of OneMatrix
.
If A
is (3, 3) dimensional and B
is (1, 1) dimensional, you can multiply them with
A*OneMatrix(3, 1)*B
I am looking for a way to create an expression (x.T*x)*I
where x
is a column vector, and I is identity (all symbolic, i.e. using MatrixSymbol
and Identity
). This requires allowing Matrix(1,1) to multiply with a matrix. More importantly, this also results from differentiating a valid expression (x*x.T*x)
w.r.t. x
and sympy gives a shape error
x = MatrixSymbol('x',n,1)
diff( x * x.T * x, x)
ShapeError: Matrices I and x.T*x are not aligned
The differentiation is in fact defined and involves the expression (x.T*x)*I
. In this case, sympy is not consistent because sympy itself creates an expression (x.T*x)*I
and complains that it is not allowed. I think sympy should allow Matrix(1,1) to multiply with a matrix because it breaks the matrix calculus chain rule here.
I get a different error message indicating that shapes are incompatible for adding. Printing the args and shapes shows the following at the moment of error:
>>> x = MatrixSymbol('x',n,1)
>>> diff( x * x.T * x, x)
...
[PermuteDims(ArrayTensorProduct(x, x.T), (1 2 3)),
PermuteDims(ArrayTensorProduct(I, x.T*x), (3)(1 2)),
PermuteDims(ArrayTensorProduct(x*x.T, I), (1 2 3))]
[(n, 1, n, 1), (n, 1, n, 1), (n, 1, 1, n)]
...
ValueError: mismatching shapes in addition
Looks like a bug in the array differentiation code. Allowing multiplication by 1 x 1
matrices should not be needed in order to make that work.
@oscarbenjamin: So does that mean diff( x * x.T * x, x)
should return an unevaluated Derivative? Because if it evaluates, sympy has to find some way of representing (x.T * x) * I
.
Because if it evaluates, sympy has to find some way of representing
(x.T * x) * I
.
Ideally there would be something like dot product for matrices to do this. Probably there is something in the tensor module that can represent the contraction. For now an awkward way to turn a 1x1 matrix into a scalar is with something like det
:
In [15]: x = MatrixSymbol('x', n, 1)
In [16]: I = Identity(n)
In [17]: det(x.T * x) * I
Out[17]:
│ T │
│x ⋅x│⋅𝕀
My reasoning is that when differentiating, the rules of matrix calculus will not insert determinant or trace or indexing. And if we need a MatrixExpr as the result of diff( x * x.T * x, x)
then MatrixExpr should be able to hold the product of Matrix(1,1) and a matrix. We would want a MatrixExpr because we can easily write the answer on paper, so sympy not returning MatrixExpr for something so simple would be very limiting. It's more of a representation issue than differentiation and is likely to occur in many other scenarios. There does not seem to be a way out.
There does not seem to be a way out.
There is a way out. A MatMul
is perfectly capable of holding a scalar as one of its args. There just needs to be a way to represent x.T * x
but as a scalar rather than a 1x1 matrix e.g. dot(x, x)
or norm(x)
or something.
Would a function that converts 1x1 to/from scalar be better?
But dot
should also work as 1x1 can arise only by dot I suppose.
Would a function that converts 1x1 to/from scalar be better?
Maybe.
The real question is where these formulas come from in the first place. Note that with explicit matrices your example works out to be an array of shape (n,1,n,1)
:
In [40]: x, y, z = symbols('x, y, z')
In [41]: X = Matrix([x, y, z])
In [42]: X
Out[42]:
⎡x⎤
⎢ ⎥
⎢y⎥
⎢ ⎥
⎣z⎦
In [43]: diff(X*X.T*X, X)
Out[43]:
⎡⎡ 2 2 2⎤⎤
⎢⎢3⋅x + y + z ⎥⎥
⎢⎢ ⎥⎥
⎢⎢ 2⋅x⋅y ⎥⎥
⎢⎢ ⎥⎥
⎢⎣ 2⋅x⋅z ⎦⎥
⎢ ⎥
⎢⎡ 2⋅x⋅y ⎤⎥
⎢⎢ ⎥⎥
⎢⎢ 2 2 2⎥⎥
⎢⎢x + 3⋅y + z ⎥⎥
⎢⎢ ⎥⎥
⎢⎣ 2⋅y⋅z ⎦⎥
⎢ ⎥
⎢⎡ 2⋅x⋅z ⎤⎥
⎢⎢ ⎥⎥
⎢⎢ 2⋅y⋅z ⎥⎥
⎢⎢ ⎥⎥
⎢⎢ 2 2 2⎥⎥
⎣⎣x + y + 3⋅z ⎦⎦
In [44]: diff(X*X.T*X, X).shape
Out[44]: (3, 1, 3, 1)
The formula that you see not working arises out of a similar confusion in which something attempts to identify the (3,1,3,1)
4D array with a 3x3 matrix.
With explicit matrices you have to use jacobian()
. In your example, it should be (X*X.T*X).jacobian(X)
. I thought differentiating with a vector means Jacobian, but diff
seems to be behaving differently for explicit matrices.
Is the result you showed from diff
actually correct? It doesn't agree with the output from Jacobian or for explicit matrices:
In [71]: det(X.T * X) * eye(3)
Out[71]:
⎡ 2 2 2 ⎤
⎢x + y + z 0 0 ⎥
⎢ ⎥
⎢ 2 2 2 ⎥
⎢ 0 x + y + z 0 ⎥
⎢ ⎥
⎢ 2 2 2⎥
⎣ 0 0 x + y + z ⎦
In [72]: 2 * (X * X.T) * eye(3) + det(X.T * X) * eye(3)
Out[72]:
⎡ 2 2 2 ⎤
⎢3⋅x + y + z 2⋅x⋅y 2⋅x⋅z ⎥
⎢ ⎥
⎢ 2 2 2 ⎥
⎢ 2⋅x⋅y x + 3⋅y + z 2⋅y⋅z ⎥
⎢ ⎥
⎢ 2 2 2⎥
⎣ 2⋅x⋅z 2⋅y⋅z x + y + 3⋅z ⎦
In [73]: (X * X.T * X).jacobian(X)
Out[73]:
⎡ 2 2 2 ⎤
⎢3⋅x + y + z 2⋅x⋅y 2⋅x⋅z ⎥
⎢ ⎥
⎢ 2 2 2 ⎥
⎢ 2⋅x⋅y x + 3⋅y + z 2⋅y⋅z ⎥
⎢ ⎥
⎢ 2 2 2⎥
⎣ 2⋅x⋅z 2⋅y⋅z x + y + 3⋅z ⎦
The result of differentiation is actually x.dot(x)*I + 2*x*x.T
.
N = 5
x = MatrixSymbol('x',N,1)
Z = ZeroMatrix(N,N)
I = Identity(N)
x1 = x.as_explicit()
I1 = I.as_explicit()
Z1 = Z.as_explicit()
Res1 = x1.dot(x1)*I1 + 2*x1*x1.T
Res2 = (x1.T*x1*x1.T).jacobian(x1)
(Res1-Res2).simplify() == Z1
Is the result you showed from diff actually correct?
@oscarbenjamin: Sorry I didn't understand your question. I just realized you already have the correct differentiation in In[72]. What exactly is your question?
What exactly is your question?
No, I was getting confused. I thought you intended x.T * x * I
to be the whole output but I see now that it is only part of the output.
I think that the problem here comes somewhere from trying to compress a 4D array into a 2D array but it can't work consistently in all cases. I don't think it's possible to make formal complete system if there are things like this:
In [80]: diff(x, x)
Out[80]: 𝕀
In [81]: diff(x.T, x)
Out[81]: 𝕀
This should only be related to the layout conventions.
ok I see now. So what should be the output of diff(x.T, x)
anyway? I'm not sure. It would be something like a derivative of matrix by vector, so something high dimensional?
Say if matrix-by-vector derivative is outlawed, then would the equivalence of diff(x.T, x)
and diff(x, x)
be a problem?
I'm not sure that it's possible to do this kind of matrix calculus in a general way without using higher dimensional arrays. I wonder if many of the formulas used for these things involve too much abuse of notation to be definable in a formal way.
How would you define and compute diff(x * x.T * x, x)
from first principles?
My understanding is that matrix differentiation in SymPy already implicitly drops size 1 dimensions in order to give a matrix instead of a tensor. So maybe doing something similar for scalars isn't too egregious. I agree it would be good to have a formalism for when exactly it happens. Maybe @Upabjojr can comment.
@oscarbenjamin That can be derived by thinking of x.T x
as a scalar function of x
and applying the product rule for f(x) * v(x)
where f
is a scalar and v
is a vector.
f(x) = x.T x
v(x) = x
d/dx [ f(x) v(x) ] = f dv/dx + v df/dx
= ( x.T x ) I + 2 x x.T
Matrices with shape
(1, 1)
may be seen as scalars. They might be allowed to multiply any other matrix without raisingShapeError
.Similarly matrices of shape
(k, 1)
may also be allowed to multiply any other matrix?