sympy / sympy

A computer algebra system written in pure Python
https://sympy.org/
Other
12.98k stars 4.43k forks source link

Allow matrix with shape (1, 1) to multiply any matrix shape? #16504

Open Upabjojr opened 5 years ago

Upabjojr commented 5 years ago

Matrices with shape (1, 1) may be seen as scalars. They might be allowed to multiply any other matrix without raising ShapeError.

Similarly matrices of shape (k, 1) may also be allowed to multiply any other matrix?

supreet11agrawal commented 5 years ago

I think that this can be implemented. Numpy also supports broadcasting. For reference: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

cbm755 commented 5 years ago

9190 is related.

oscarbenjamin commented 5 years ago

This kind of special case can often lead to awkwardness down the line. Why is it advantageous? The user can just use a scalar instead of a 1x1 matrix.

This is what happens in Matlab and it makes sense there. But that's because in Malab there is no way to distinguish between a scalar and a 1x1 matrix. Why would it be useful in SymPy?

supreet11agrawal commented 5 years ago

In making machine learning and deep learning models, broadcasting is especially beneficial. The cases where it is used mostly are somewhat like w*x + b where w is nxn weight matrix and a and b are 1*n vectors.

Upabjojr commented 5 years ago

This kind of special case can often lead to awkwardness down the line. Why is it advantageous? The user can just use a scalar instead of a 1x1 matrix.

This is what happens in Matlab and it makes sense there. But that's because in Malab there is no way to distinguish between a scalar and a 1x1 matrix. Why would it be useful in SymPy?

Well, actually not. The 1x1 matrix may be an expression which cannot be simplified leading to a 1x1 shape.

I was actually coming along this problem while pondering about the support for matrix derivatives.

Upabjojr commented 5 years ago

In making machine learning and deep learning models, broadcasting is especially beneficial. The cases where it is used mostly are somewhat like w*x + b where w is nxn weight matrix and a and b are 1*n vectors.

We are never going to support broadcasting. The (1, 1) matrix shape is a special case as it is equivalent to a scalar.

Upabjojr commented 5 years ago

KroneckerProduct may be used to achieve this behaviour with (1, 1)-shaped matrices.

oscarbenjamin commented 5 years ago

Well, actually not. The 1x1 matrix may be an expression which cannot be simplified leading to a 1x1 shape.

Can you give an example?

asmeurer commented 5 years ago

This is effectively a duplicate of https://github.com/sympy/sympy/issues/9190. I don't think we should allow this. Type confusion imposes a tradeoff. On the one hand, you get more convenient syntax (you don't have to add (...)[0] to 1x1 matrices). On the other hand, you get reduced assurances from type checking.

For SymPy, I generally prefer to avoid such type confusion, as it's not mathematically valid. One must also consider from a practical point of view. Should a 1x1 matrix be allowed in any scalar context? If not, it will feel inconsistent to users. If so, does it need to be a separate type from other matrices? What is the rule for when an operation on a 1x1 matrix and an actual scalar should produce a matrix vs. a scalar (what should x + Matrix([y]) be?). If some operation produces a scalar, then matrix operations won't work any more, so should every scalar be a 1x1 matrix?

NumPy answers yes to the last question, which is why this works. It does this via broadcasting.

czgdp1807 commented 5 years ago

I think both @Upabjojr and @asmeurer are right. Matrix multiplication shouldn't be allowed to go against it's definition, since it will effect the integrity of the matrices implemented in sympy. Moreover, we can use KroneckerProduct since, it allows multiplication of arbitrary ordered matrices. Is it implemented in sympy? If not can we work on this, @Upabjojr ?

Upabjojr commented 5 years ago

Can you give an example?

A = MatrixSymbol("A", 1, 3)
B = MatrixSymbol("B", 3, 1)
A*B  #  <== your example
Upabjojr commented 5 years ago

Moreover, we can use KroneckerProduct since, it allows multiplication of arbitrary ordered matrices. Is it implemented in sympy?

Yes, it is.

oscarbenjamin commented 5 years ago

In your example I can extract the scalar if I know that it is 1x1:

In [5]: (A*B)[0,0]                                                                                                                             
Out[5]: A₀₀⋅B₀₀ + A₀₁⋅B₁₀ + A₀₂⋅B₂₀

If I didn't know that the result was going to be 1x1 then I wouldn't attempt to do (A*B)*C unless I thought that the shapes were compatible under the rules of normal matrix multiplication.

I don't see an example where it is useful to special case 1x1 matrices in normal matrix multiplication. In general we would lose the invariant that (A*B).shape == (A.shape[0], B.shape[1]).

Upabjojr commented 5 years ago

I was actually noticing that people who implemented this website allow unity shape to multiply everything.

Maybe we should not follow their example.

asmeurer commented 5 years ago

Are there matrix derivative formulas for MatrixElement?

Upabjojr commented 5 years ago

Are there matrix derivative formulas for MatrixElement?

Do you mean the Kronecker delta?

asmeurer commented 5 years ago

I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?

Upabjojr commented 5 years ago

I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?

MatrixElement is treated as a symbol independent to MatrixSymbol:

In [2]: A = MatrixSymbol("A", 1, 1)

In [3]: A[0, 0].diff(A)
Out[3]: 0

Do you want to change this behavior?

czgdp1807 commented 5 years ago

I thought on this problem more. I have a question, that is there any possibility of applying assumptions to matrix to consider it as a scalar, whenever, it attains 1X1 order, or may be a global flag for matrices using which it can be determined that when to raise an exception and when to just use normal multiplication for 1X1 matrices? Using flags or additional parameters can mess up the API though. Any views on this @ everybody?

Upabjojr commented 5 years ago

applying assumptions to matrix to consider it as a scalar

No, please. Assumptions are already a mess, let's avoid making them even worse.

sylee957 commented 5 years ago

I mean if MatrixElement(A, ...) appears in an expression does that break the ability to take the derivative wrt A?

MatrixElement is treated as a symbol independent to MatrixSymbol:

In [2]: A = MatrixSymbol("A", 1, 1)

In [3]: A[0, 0].diff(A)
Out[3]: 0

Do you want to change this behavior?

@Upabjojr Well, in my recent PR #17232, I've actually attempted to change the behavior I'm not sure in which context it should be considered zero matrix, but it looks like it is inconsistent with other behaviors as #17229, and also the matrix derivative section in the matrix cookbook.

What is your opinion about this?

sylee957 commented 5 years ago

One other thing to note about is that, python summation seems like incrementing zero first time, so matrix summation would give error if not specified as below

from sympy import *

m = Matrix([[1, 2], [3, 4]])
sum((i*m for i in range(1, 11)), ZeroMatrix(2, 2))

Numpy or such may not have such problems because it would broadcast zeros. But I would see that SymPy should be strict about math in general. So in other way, this thing may have to be documented as some gotcha.

moritzsattler commented 4 years ago

In my opinion, the following behavior of block matrices is related to this issue and should also be regarded.

Consider two block matrices consisting of the blocks A 3x3 and B 3x1:

The result of this is C 3x1

>>> A = MatrixSymbol('A',3,3)
>>> B = MatrixSymbol('B',3,1)
>>> C = Matrix([[A, A]])*Matrix([[B],[B]])
>>> C
Matrix([[2*A*B]])
>>> C.shape
(1, 1)

Is this result intended? Abolishing 1x1 matrices would lead to the expected result.

>>> C[0,0].shape
(3, 1)

In my opinion this is related to the unexpected behavior discussed on this page:

>>> a = Symbol('a')
>>> b = Symbol('b')
>>> c = Matrix([[a, a]])*Matrix([[b],[b]])
>>> c.is_Matrix
True
sylee957 commented 4 years ago

I don’t think that you should use Matrix, but BlockMatrix as a container object for symbolic matrix blocks.

moritzsattler commented 4 years ago

Thank you for pointing that out. Maybe I don't quite understand the point of having so many classes. For me as a user, I still see the connection to the 1x1 matrices.

Nevertheless, you are right: A way that leads to the expected result, is

>>> D = block_collapse(BlockMatrix([[A,A]])*BlockMatrix([[B],[B]]))
>>> D
2*A*B
>>> D.shape
(3, 1)
oscarbenjamin commented 4 years ago

Perhaps Matrix([[matrices...]]) should give an error message explaining to use BlockMatrix.

sylee957 commented 4 years ago

Unfortunately, I don’t think that it’s easy to disentangle block matrix from matrices because BlockMatrix actually uses Matrix as its own container

oscarbenjamin commented 4 years ago

Perhaps instead Matrix([[matrices]]) could automatically convert to BlockMatrix then. Actually I can't reproduce the above problem with 1.6 or master:

In [5]: A = MatrixSymbol('A', 3, 3)                                                                                    

In [6]: B = MatrixSymbol('B', 3, 1)                                                                                    

In [7]: Matrix([[A, A]])                                                                                               
Out[7]: 
⎡A₀₀  A₀₁  A₀₂  A₀₀  A₀₁  A₀₂⎤
⎢                            ⎥
⎢A₁₀  A₁₁  A₁₂  A₁₀  A₁₁  A₁₂⎥
⎢                            ⎥
⎣A₂₀  A₂₁  A₂₂  A₂₀  A₂₁  A₂₂⎦

In [8]: C = Matrix([[A, A]])*Matrix([[B],[B]])                                                                         

In [9]: C                                                                                                              
Out[9]: 
⎡2⋅A₀₀⋅B₀₀ + 2⋅A₀₁⋅B₁₀ + 2⋅A₀₂⋅B₂₀⎤
⎢                                 ⎥
⎢2⋅A₁₀⋅B₀₀ + 2⋅A₁₁⋅B₁₀ + 2⋅A₁₂⋅B₂₀⎥
⎢                                 ⎥
⎣2⋅A₂₀⋅B₀₀ + 2⋅A₂₁⋅B₁₀ + 2⋅A₂₂⋅B₂₀⎦

In [10]: C.shape                                                                                                       
Out[10]: (3, 1)
moritzsattler commented 4 years ago

I am very sorry to have stolen your time. (A greenhorn learned a lesson.) Maybe this aspect can nevertheless contribute to the discussion on the expected behavior of 1x1 matrices.

sylee957 commented 4 years ago

After some version, it had likely been changed to expand all the block matrices. However, if you use evaluate=False you may still see the problematic behavior.

Upabjojr commented 4 years ago

The behaviour of broadcasting has been (partially) solved by the introduction of OneMatrix.

If A is (3, 3) dimensional and B is (1, 1) dimensional, you can multiply them with

A*OneMatrix(3, 1)*B
breadnbeer commented 2 years ago

I am looking for a way to create an expression (x.T*x)*I where x is a column vector, and I is identity (all symbolic, i.e. using MatrixSymbol and Identity). This requires allowing Matrix(1,1) to multiply with a matrix. More importantly, this also results from differentiating a valid expression (x*x.T*x) w.r.t. x and sympy gives a shape error

x = MatrixSymbol('x',n,1)
diff( x * x.T * x, x)
ShapeError: Matrices I and x.T*x are not aligned

The differentiation is in fact defined and involves the expression (x.T*x)*I. In this case, sympy is not consistent because sympy itself creates an expression (x.T*x)*I and complains that it is not allowed. I think sympy should allow Matrix(1,1) to multiply with a matrix because it breaks the matrix calculus chain rule here.

smichr commented 2 years ago

I get a different error message indicating that shapes are incompatible for adding. Printing the args and shapes shows the following at the moment of error:

>>> x = MatrixSymbol('x',n,1)
>>> diff( x * x.T * x, x)
...
[PermuteDims(ArrayTensorProduct(x, x.T), (1 2 3)),
 PermuteDims(ArrayTensorProduct(I, x.T*x), (3)(1 2)),
 PermuteDims(ArrayTensorProduct(x*x.T, I), (1 2 3))]

[(n, 1, n, 1), (n, 1, n, 1), (n, 1, 1, n)]
...
ValueError: mismatching shapes in addition
oscarbenjamin commented 2 years ago

Looks like a bug in the array differentiation code. Allowing multiplication by 1 x 1 matrices should not be needed in order to make that work.

breadnbeer commented 2 years ago

@oscarbenjamin: So does that mean diff( x * x.T * x, x) should return an unevaluated Derivative? Because if it evaluates, sympy has to find some way of representing (x.T * x) * I.

oscarbenjamin commented 2 years ago

Because if it evaluates, sympy has to find some way of representing (x.T * x) * I.

Ideally there would be something like dot product for matrices to do this. Probably there is something in the tensor module that can represent the contraction. For now an awkward way to turn a 1x1 matrix into a scalar is with something like det:

In [15]: x = MatrixSymbol('x', n, 1)

In [16]: I = Identity(n)

In [17]: det(x.T * x) * I
Out[17]: 
│ T  │  
│x ⋅x│⋅𝕀
breadnbeer commented 2 years ago

My reasoning is that when differentiating, the rules of matrix calculus will not insert determinant or trace or indexing. And if we need a MatrixExpr as the result of diff( x * x.T * x, x) then MatrixExpr should be able to hold the product of Matrix(1,1) and a matrix. We would want a MatrixExpr because we can easily write the answer on paper, so sympy not returning MatrixExpr for something so simple would be very limiting. It's more of a representation issue than differentiation and is likely to occur in many other scenarios. There does not seem to be a way out.

oscarbenjamin commented 2 years ago

There does not seem to be a way out.

There is a way out. A MatMul is perfectly capable of holding a scalar as one of its args. There just needs to be a way to represent x.T * x but as a scalar rather than a 1x1 matrix e.g. dot(x, x) or norm(x) or something.

breadnbeer commented 2 years ago

Would a function that converts 1x1 to/from scalar be better? But dot should also work as 1x1 can arise only by dot I suppose.

oscarbenjamin commented 2 years ago

Would a function that converts 1x1 to/from scalar be better?

Maybe.

The real question is where these formulas come from in the first place. Note that with explicit matrices your example works out to be an array of shape (n,1,n,1):

In [40]: x, y, z = symbols('x, y, z')

In [41]: X = Matrix([x, y, z])

In [42]: X
Out[42]: 
⎡x⎤
⎢ ⎥
⎢y⎥
⎢ ⎥
⎣z⎦

In [43]: diff(X*X.T*X, X)
Out[43]: 
⎡⎡   2    2    2⎤⎤
⎢⎢3⋅x  + y  + z ⎥⎥
⎢⎢              ⎥⎥
⎢⎢    2⋅x⋅y     ⎥⎥
⎢⎢              ⎥⎥
⎢⎣    2⋅x⋅z     ⎦⎥
⎢                ⎥
⎢⎡    2⋅x⋅y     ⎤⎥
⎢⎢              ⎥⎥
⎢⎢ 2      2    2⎥⎥
⎢⎢x  + 3⋅y  + z ⎥⎥
⎢⎢              ⎥⎥
⎢⎣    2⋅y⋅z     ⎦⎥
⎢                ⎥
⎢⎡    2⋅x⋅z     ⎤⎥
⎢⎢              ⎥⎥
⎢⎢    2⋅y⋅z     ⎥⎥
⎢⎢              ⎥⎥
⎢⎢ 2    2      2⎥⎥
⎣⎣x  + y  + 3⋅z ⎦⎦

In [44]: diff(X*X.T*X, X).shape
Out[44]: (3, 1, 3, 1)

The formula that you see not working arises out of a similar confusion in which something attempts to identify the (3,1,3,1) 4D array with a 3x3 matrix.

breadnbeer commented 2 years ago

With explicit matrices you have to use jacobian(). In your example, it should be (X*X.T*X).jacobian(X). I thought differentiating with a vector means Jacobian, but diff seems to be behaving differently for explicit matrices.

oscarbenjamin commented 2 years ago

Is the result you showed from diff actually correct? It doesn't agree with the output from Jacobian or for explicit matrices:

In [71]: det(X.T * X) * eye(3)
Out[71]: 
⎡ 2    2    2                            ⎤
⎢x  + y  + z        0             0      ⎥
⎢                                        ⎥
⎢               2    2    2              ⎥
⎢     0        x  + y  + z        0      ⎥
⎢                                        ⎥
⎢                             2    2    2⎥
⎣     0             0        x  + y  + z ⎦

In [72]: 2 * (X * X.T) * eye(3) + det(X.T * X) * eye(3)
Out[72]: 
⎡   2    2    2                                ⎤
⎢3⋅x  + y  + z       2⋅x⋅y           2⋅x⋅z     ⎥
⎢                                              ⎥
⎢                 2      2    2                ⎥
⎢    2⋅x⋅y       x  + 3⋅y  + z       2⋅y⋅z     ⎥
⎢                                              ⎥
⎢                                 2    2      2⎥
⎣    2⋅x⋅z           2⋅y⋅z       x  + y  + 3⋅z ⎦

In [73]: (X * X.T * X).jacobian(X)
Out[73]: 
⎡   2    2    2                                ⎤
⎢3⋅x  + y  + z       2⋅x⋅y           2⋅x⋅z     ⎥
⎢                                              ⎥
⎢                 2      2    2                ⎥
⎢    2⋅x⋅y       x  + 3⋅y  + z       2⋅y⋅z     ⎥
⎢                                              ⎥
⎢                                 2    2      2⎥
⎣    2⋅x⋅z           2⋅y⋅z       x  + y  + 3⋅z ⎦
breadnbeer commented 2 years ago

The result of differentiation is actually x.dot(x)*I + 2*x*x.T.

N = 5
x = MatrixSymbol('x',N,1)
Z = ZeroMatrix(N,N)
I = Identity(N)

x1 = x.as_explicit()
I1 = I.as_explicit()
Z1 = Z.as_explicit()

Res1 = x1.dot(x1)*I1 + 2*x1*x1.T
Res2 = (x1.T*x1*x1.T).jacobian(x1)

(Res1-Res2).simplify() == Z1
breadnbeer commented 2 years ago

Is the result you showed from diff actually correct?

@oscarbenjamin: Sorry I didn't understand your question. I just realized you already have the correct differentiation in In[72]. What exactly is your question?

oscarbenjamin commented 2 years ago

What exactly is your question?

No, I was getting confused. I thought you intended x.T * x * I to be the whole output but I see now that it is only part of the output.

I think that the problem here comes somewhere from trying to compress a 4D array into a 2D array but it can't work consistently in all cases. I don't think it's possible to make formal complete system if there are things like this:

In [80]: diff(x, x)
Out[80]: 𝕀

In [81]: diff(x.T, x)
Out[81]: 𝕀
breadnbeer commented 2 years ago

This should only be related to the layout conventions.

breadnbeer commented 2 years ago

ok I see now. So what should be the output of diff(x.T, x) anyway? I'm not sure. It would be something like a derivative of matrix by vector, so something high dimensional? Say if matrix-by-vector derivative is outlawed, then would the equivalence of diff(x.T, x) and diff(x, x) be a problem?

oscarbenjamin commented 2 years ago

I'm not sure that it's possible to do this kind of matrix calculus in a general way without using higher dimensional arrays. I wonder if many of the formulas used for these things involve too much abuse of notation to be definable in a formal way.

How would you define and compute diff(x * x.T * x, x) from first principles?

asmeurer commented 2 years ago

My understanding is that matrix differentiation in SymPy already implicitly drops size 1 dimensions in order to give a matrix instead of a tensor. So maybe doing something similar for scalars isn't too egregious. I agree it would be good to have a formalism for when exactly it happens. Maybe @Upabjojr can comment.

breadnbeer commented 2 years ago

@oscarbenjamin That can be derived by thinking of x.T x as a scalar function of x and applying the product rule for f(x) * v(x) where f is a scalar and v is a vector.

f(x) = x.T x
v(x) = x
d/dx [ f(x) v(x) ] = f dv/dx + v df/dx 
                   = ( x.T x ) I + 2 x x.T