sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.34k stars 452 forks source link

Improve documentation of `charpoly` and `characteristic_polynomial` methods #34725

Open maxale opened 1 year ago

maxale commented 1 year ago

The following code

K.<x,y> = QQ[]
M = Matrix([[x,x],[0,x]])
print( M.characteristic_polynomial().parent() )
print( M.characteristic_polynomial(y).parent() )

prints

Univariate Polynomial Ring in x over Multivariate Polynomial Ring in x, y over Rational Field
Univariate Polynomial Ring in y over Multivariate Polynomial Ring in x, y over Rational Field

First, these rings do not make much sense since the same variable appears in the base field and as a polynomial variable over it. Second, in both cases the parent should simply be K (ie. Multivariate Polynomial Ring in x, y over Rational Field).

Component: linear algebra

Issue created by migration from https://trac.sagemath.org/ticket/34725

nbruin commented 1 year ago
comment:2

I think the behaviour described in the ticket is intentional. It's certainly undesirable to have a repeated name, but that's why the option exists to give a name of your own.

It's certainly not the case that the parent of a characteristic polynomial of a matrix over a ring R should ever lie in R itself, regardless of names.

If the user explicitly asks for clashes in names then they're getting what they asked for, in the current model.

There is a good reason why the system currently does not go out of its way to come up with a "unique" variable to use in the parent of the characteristic polynomial: it's expensive to do so! You'd need to explicitly descend the tower and keep track of any names that occur and then use some algorithm to come up with a variable name that doesn't occur. That's expensive to do and often unnecessary! There are plenty of operations (find eigenvalues in R for instance) for which the variable name used in the characteristic polynomial is of no actual relevance.

So the current design optimizes for speed for library use (where the variable names aren't examined). If you're interested in having a meaningful variable name, specify it (and it's your own responsibility to not have clashes if it's important for you to avoid those).

It's not unprecedented for CAS to produce ambiguous string representations of mathematical objects. I agree that x*x in QQ['x']['x'] is definitely up there, but with routines like "coefficients" and explicitly referencing generators:

R = Q['x']
S = R['x']
f = R.0 * S.0^2
f.coefficients()
f.monomials()

you can actually work with these objects. Note that library routines (or routines you write yourself for the purpose of working in reasonable generality) shouldn't depend on specific generator names anyway (in fact it's a common source of bugs if code does), so in many settings it doesn't make a difference what the names are.

Of course, if you're actually planning on looking at the string representations of objects, you should preserve your sanity and make sure you create your objects with appropriately chosen generator names. That's why characteristic_polynomial accepts an optional name argument.

DaveWitteMorris commented 1 year ago

Reviewer: Dave Morris

DaveWitteMorris commented 1 year ago
comment:3

I think that Sage is acting as it should, so we should close this ticket as invalid.

maxale commented 1 year ago
comment:4

Well, I can agree on the current behavior of M.characteristic_polynomial(), ie. when the variable is not specified, but not on the case M.characteristic_polynomial(y). In the latter case, I specifically request to create a polynomial in existing variable y in K, and expect the result be the same as(y*identity_matrix(2) - M).det() by definition of the characteristic polynomial, which is the element of K if computed by this formula. However, M.characteristic_polynomial(y) produces something else.

maxale commented 1 year ago
comment:5

Please notice that in a slightly modified example everything works as expected:

R.<x> = QQ[]
K.<y> = R[]
M = Matrix([[x,x],[0,x]])
print( M.characteristic_polynomial(y).parent() == K )   # this is True
print( M.characteristic_polynomial(y) == (y*identity_matrix(2) - M).det() )    # True again

However, things break when K.<x,y> = QQ[] happens to coincide with M.base_ring() (even though M does not depend on y). I see no much difference between the two cases, and expect M.characteristic_polynomial(y) == (y*identity_matrix(2) - M).det() to hold whether or not y belong to M.base_ring().

nbruin commented 1 year ago
comment:6

Replying to Max Alekseyev:

In the latter case, I specifically request to create a polynomial in existing variable y in K, and expect the result be the same as(y*identity_matrix(2) - M).det() by definition of the characteristic polynomial, which is the element of K if computed by this formula. However, M.characteristic_polynomial(y) produces something else.

Ah, yes. That's a user-interface problem. The parameter characteristic_polynomial accepts is a string, to be used as a variable name. For convenience, it accepts an object that is convertible to a string. So when you pass in y, it is really just equivalent to passing in the string "y". Another illustration:

sage: M=matrix(GF(17),2,2,[3,0,0,3])
sage: R.<x>=GF(3)[]
sage: M.charpoly(x)
x^2 + 11*x + 9
sage: M.charpoly(x).parent()
Univariate Polynomial Ring in x over Finite Field of size 17

I can see how it seems that you're allowed to ask for the characteristic polynomial evaluated at the value specified, but that's not what the design is, as the example above clearly shows.

It would probably cause a lot of backward compatibility errors if we were to throw more strict errors on the parameter passed in.

maxale commented 1 year ago
comment:7

I do not see how the user interface can explain the discrepancy here. Let me put two cases side by side:

K.<x,y> = QQ[]
Z.<z> = K[]
M = Matrix([[x,x],[0,x]])
print( M.characteristic_polynomial(z) == (z*identity_matrix(2) - M).det() )   # this is True
print( M.characteristic_polynomial(y) == (y*identity_matrix(2) - M).det() )   # but this is False

The characteristic polynomial is computed correctly as a polynomial in z but not in y.

DaveWitteMorris commented 1 year ago
comment:8

The characteristic polynomial is calculated correctly in both cases, although it is not what you thought it should be (and the fact that more than one object is printed as "y" can be confusing).

Elements of the base ring are constants, not variables. They can appear as coefficients of the characteristic polynomial, but not as the variable. If I understand correctly, you want sage to consider the element y of K as a variable, because it does not actually occur in the matrix, but I think that means you actually want to consider the matrix as having a different base ring. If so, then it's your job to tell that to sage, by explicitly changing the base ring.

What is it that you would expect M.characteristic_polynomial(x) to do? I don't think there should be any difference between the behaviours of x and y in this situation, since both are elements of the base ring. Do you expect the following code to produce polynomials that are in two different rings? I think that would cause lots of problems, because I think people would expect the two polynomials to have the same characteristic polynomial (which is currently the case).

for i in [0, 1]:
    Matrix([[x, 0],[i*y, x]]).characteristic_polynomial(y)

Perhaps we should print a warning (or raise an error?) in this situation, but I am -1 on having sage use an element of the base ring as a variable.

nbruin commented 1 year ago
comment:9

Replying to Max Alekseyev:

I do not see how the user interface can explain the discrepancy here. Let me put two cases side by side:

K.<x,y> = QQ[]
Z.<z> = K[]
M = Matrix([[x,x],[0,x]])
print( M.characteristic_polynomial(z) == (z*identity_matrix(2) - M).det() )   # this is True
print( M.characteristic_polynomial(y) == (y*identity_matrix(2) - M).det() )   # but this is False

The documentation of M.characteristic_polynomial states that the optional argument is a string that allows specification of the name of the variable that is to be used for the characteristic variable. It does not take a variable. It is usual in python that when a string is accepted, also objects that can be converted to strings are accepted. In that case they are automatically converted to strings. Variable names such as y above have that property.

In the construction above, M.characteristic_polynomial("z") happens to construct a polynomial in Z because Z is K["z"] is true.

In the construction above, M.characteristic_polynomial("y") constructs a polynomial in K["y"], which is not the parent of the object your python variable y above is bound to.

It is a user interface problem because the input that characteristic_polynomial is accepting is making you believe that you're allowed to specify the parent of the resulting polynomial fully. But you're not: you're only allowed to specify the name of the variable that will be used. The problem is limited, though, because the documentation is unambiguous. So I agree with a "wontfix" for this, but acknowledge that the interface design is leading you astray a bit.

DaveWitteMorris commented 1 year ago

Changed reviewer from Dave Morris to none

DaveWitteMorris commented 1 year ago
comment:10

The documentation does not seem unambiguous to me, so I think we should clarify it. Am I looking in the wrong place?

M.characteristic_polynomial? just says "Synonym for self.charpoly(...)" and gives an example.

The output of M.charpoly? is confusing, because it uses the same letter for the matrix as for the variable name:

Signature:      charpoly(x, var='x')
Docstring:     
   Return the characteristic polynomial of "x" in the given variable.
   ...

The rest of the docstring has examples, and the following comment:

Ensure the variable name of the polynomial does not conflict with
variables used within the matrix, and that non-integral powers of
variables do not confuse the computation
(https://trac.sagemath.org/14403):

I think it is ambiguous whether the variable y is "used within the matrix", so I do not find this to be unambiguous.

I think it would be a good idea to add an INPUT: block to clarify that the input is a string (and how the string is interpreted), so I changed the milestone from invalid to 9.8.

nbruin commented 1 year ago
comment:11

Replying to Dave Morris:

I think it would be a good idea to add an INPUT: block to clarify that the input is a string (and how the string is interpreted), so I changed the milestone from invalid to 9.8.

I think there are lots matrix parents where this is implemented and indeed, since this code is old, it has a good chance of not conforming to documentation standards everywhere. When I look at

sage: M=matrix(GF(17),2,2,[3,0,0,3])
sage: M.charpoly?
sage: M=matrix(QQ['x'],2,2,[3,0,0,3])
sage: M.charpoly?

I see two implementations that do have an INPUT block, although not necessarily in the standard spot, and these do unambiguously specify var as variable name, although it's not specified that it's a str. I think it would be more self-documenting if that optional parameter is named name or varname to emphasize it's expected to be a string.

nbruin commented 1 year ago
comment:12

Replying to Dave Morris:

The rest of the docstring has examples, and the following comment:

Ensure the variable name of the polynomial does not conflict with
variables used within the matrix, and that non-integral powers of
variables do not confuse the computation
(https://trac.sagemath.org/14403):

I think it is ambiguous whether the variable y is "used within the matrix", so I do not find this to be unambiguous.

I might be misreading it, but that looks like a totally confused doctest. It looks like someone was planning on implementing a strategy for disambiguating variable names for charpolys of symbolic matrices, but was dissuaded and somehow committed a doctest; using .list() on the charpoly to avoid showing that the variable names are not disambiguated. This is bound to happen at some point and I think we have now found a doctest that needs to be removed.

Indeed, the documentation of the toplevel charpoly is in bad need of clean-up. The actual methods seem to be in a little better shape. This is not uncommon. Incidentally, for me both charpoly? and characteristic_polynomial? give their full (flawed) documentation. Are you looking at an old version?

DaveWitteMorris commented 1 year ago
comment:13

I am on 9.8b3. M.characteristic_polynomial? gives the docstring of Matrix.characteristic_polynomial, which is:

Synonym for self.charpoly(...).

EXAMPLES::

    sage: a = matrix(QQ, 2,2, [1,2,3,4]); a
    [1 2]
    [3 4]
    sage: a.characteristic_polynomial('T')
    T^2 - 5*T - 2

Using characteristic_polynomial? gives the documentation of a different function (from the file src/sage/misc/functional.py).

I think charpoly is defined in about 20 different files, so cleaning up all of them will be a bit of a patch bomb.

maxale commented 1 year ago
comment:14

Replying to Nils Bruin:

I can see how it seems that you're allowed to ask for the characteristic polynomial evaluated at the value specified, but that's not what the design is, as the example above clearly shows.

I see that my bugreport deviated into updating the documentation, but I'd actually prefer to make it possible for .characteristic_polynomial() to accept any values, including strings (the current behavior), variables, polynomials etc., and evaluate the characteristic polynomial at the given argument. Restricting it to just strings and interpreting anything else in a unobvious way (like variable y in my example) appears a major limitation.

For my particular application, I'm forced to avoid .characteristic_polynomial() at all and compute the characteristic polynomial based on the definition like (y*identity_matrix(2) - M).det(), unfortunately.

DaveWitteMorris commented 1 year ago
comment:15

You can use the characteristic polynomial. From sage's perspective (which I think is a good one), you want to calculate the characteristic polynomial, and then substitute y for the variable:

sage: M.characteristic_polynomial().subs(y)
x^2 - 2*x*y + y^2

If you keep in mind that the characteristic polynomial needs to be a polynomial over the base ring, then you should understand that it is not the answer you are looking for, because you want an element of the base ring (i.e., a constant). Evaluating the polynomial gives the answer that you want.