charpoly name clashes with matrix content

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 11 years ago

This is a spin-off from #14403, to get that one landed and have a discussion with a wider scope here.

Problem

matrix.charpoly() will return a polynomial in a polynomial ring over x. That variable name is always used, unless overridden by an argument provided by the user. This is OK in many cases, but can become confusing or outright dangerous in cases where x already occursd inside the matrix, since in those cases there would be two occurrences of x with different semantics:

sage: x = var('x')                      
sage: R = SR                            
sage: matrix(R,2,2,[x,1,1,x]).charpoly()
x^2 - 2*x*x + x^2 - 1
sage: R.<x> = QQ[]                        
sage: matrix(R,2,2,[x,1,1,x]).charpoly()
x^2 - 2*x*x + x^2 - 1
sage: R.<x> = FiniteField(4)              
sage: matrix(R,2,2,[x,1,1,x]).charpoly()
x^2 + x
sage: R.<x> = PowerSeriesRing(QQ)         
sage: matrix(R,2,2,[x,1,1,x]).charpoly()
x^2 - 2*x*x - 1 + x^2

It is easy to see that this kind of result can be confusing to say the least. The data structure returned is a polynomial over the ring of the matrix, so the coefficients are all right. Any piece of code only interested in the list of coefficients shouldn't have to bother. But even code could encounter trouble if it were to pass this to some other application or library using a string-like representation, like this:

sage: R.<x> = QQ[]                      
sage: y = var('y')                        
sage: (matrix(R,2,2,[x,1,1,x]).charpoly()*(1 + y))            
-y - 1
sage: (matrix(R,2,2,[x,1,1,x]).charpoly('t')*(1 + y))         
(y + 1)*((t - 2*x)*t + x^2 - 1)

As you can see, the distinction between the two flavours of x can easily get lost, e.g. by (deliberate or accidential) coercion into the symbolic ring. The real problem here is not so much the fact that this does not work (after all, the user could have chosen a different variable name in the first place), but rather that this will seem to work but will yield wrong results.

Proposal

For this reason, I am of the strong opinion that charpoly (and perhaps other polynomial-returning functions as well, so if you know any, please point them out) should take some care to choose a non-clashing name.

To implement this, we'd need a general method to recursively enumerate all names occuring in a set of values from a given ring. For most rings a complete list of generators would probably be appropriate, no matter whether they actually occur. For the symbolic ring I'd prefer to only use those which actually occur, since otherwise the current default of x would probably never be chosen for symbolic matrices, causing an unexpected difference in behavior.

Given a list of symbol names, we could then try to come up with a non-clashing one. The most elaborate scheme would try some well-known symbols first, like ['x', 'y', 'z', 't', 'u', 'mu'], before using indexed ones like ['x1', 'x2', …], until a locally unique one has been found. Of course one could omit the former, and use SR.symbol() instead of the latter, but the consequence would be that the default charpoly of a given matrix would not depend on this matrix alone, but also on which other computations have been done before. Plus those names are harder to read. So I'd rather avoid this approach.

Unsatisfied expectations

Some users might be confused by the unexpected and perhaps unintuitive name of the variable. Compared with the confusion caused by the clashing names, I consider this acceptable.

In #14403 comment:10, nbruin pointed out that one might expect

charpoly(A)*charpoly(B) == charpoly(block_diagonal(A,B))

but this would no longer be universally true after this change. I consider this acceptable, since the result would not be an incorrect computation but instead an exception thrown due to a multiplication between incompatible rings. Faced with this, users could always augment their calls to

charpoly(A,'t')*charpoly(B,'t') == charpoly(block_diagonal(A,B),'t')

for some non-clashing variable name t.

In #14403 comment:17, burcin stated that a method to obtain new variable names should operate in constant time, everything else being to complicated. I agree that constant time would be desirable for all those cases where the variable name really doesn't matter, i.e. where the polynomial is only used internally, as a list of coefficients no mater the variable name. But wherever the polynomial might end in the hands of the user, I think a bit more work is warranted.

I wonder whether it would be feasible to always explicitely state any variable name in the internal calls, thus avoiding the automatic selection, and leave the default argument case to users only. This would mean going over the whole codebase and replacing all calls of .charpoly() by charpoly('x'), all calls of charpoly(*) by charpoly(*, 'x'), and likewise for characteristic_polynomial. The non-method versions might be difficult to express in terms of regular expressions, but grepping for charpoly should highlight all use cases, and editor macros with a fall back to manual editing should see them adjusted.

Component: linear algebra

Issue created by migration from https://trac.sagemath.org/ticket/14972

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 11 years ago

Description changed:

--- 
+++ 
@@ -58,3 +58,7 @@

for some non-clashing variable name t. + +In #14403 comment:17, burcin stated that a method to obtain new variable names should operate in constant time, everything else being to complicated. I agree that constant time would be desirable for all those cases where the variable name really doesn't matter, i.e. where the polynomial is only used internally, as a list of coefficients no mater the variable name. But wherever the polynomial might end in the hands of the user, I think a bit more work is warranted. + +I wonder whether it would be feasible to always explicitely state any variable name in the internal calls, thus avoiding the automatic selection, and leave the default argument case to users only. This would mean going over the whole codebase and replacing all calls of .charpoly() by charpoly('x'), all calls of charpoly(*) by charpoly(*, 'x'), and likewise for characteristic_polynomial. The non-method versions might be difficult to express in terms of regular expressions, but grepping for charpoly should highlight all use cases, and editor macros with a fall back to manual editing should see them adjusted.

nbruin commented 11 years ago

comment:2

I think always requiring a variable name (or a polynomial ring or something similar) is the "right" thing to do: in sage, there is no such thing as "the" polynomial ring over a given ring, so there is no way to choose a canonical default. Variable names are an essential ingredient for a polynomial ring and since a characteristic polynomial has to live in one, this name has to be specified somehow.

It might be that the "right" thing is considered too pedantic for general use, though (certainly now that people have gotten used to not having to specify a variable).

Trying to come up with a non-clashing name would be a kludge and makes me think of one of the rules of the Zen of Python: "in the face of ambiguity, refuse the temptation to guess", which tends to be a pretty good guideline for interface design.

One would hope that the polynomial ring in which the charpol is returned would be at least a function of the parent of the matrix, but as we see, that is already not possible for SR.

If it is deemed a default choice for varname is necessary to keep the sage interface bearable then I would be in favour of making a very cheap choice: just "x" or "t" or some other reasonable looking name ("lambda" is painful because it's a keyword in python).

sagemath / sage