Open JohnCremona opened 8 years ago
I guess the first step for this is to ensure that order comparisons on number field elements by default raise an error. I think that means implementing _richcmp_
and nuking the _cmp_
on NumberFieldElement. Then we can see where that raises errors (all those places are bugs given what _cmp_
does for order comparisons) and, where necessary, we can put in "sort" commands with a suitable sort key (and make convenient key implementations available).
Failures are pretty bad. We'll need a lot of fixing (and these things are basically all bugs!)
One of the doctest failures is actually just documenting that comparison is bogus: numberfield.py:394 so that one is not so bad.
New commits:
7aca119 | trac 20028: disable nonsensical comparison for number field elements |
Thanks for geting started on this. I have built the branch and will see what "make testlong" reveals. It may be a good idea -- surely -- to not put this forward for merging until there is at least a default sorting function in place, which is quick such as by comparing strings?
Replying to @JohnCremona:
Thanks for geting started on this. I have built the branch and will see what "make testlong" reveals. It may be a good idea -- surely -- to not put this forward for merging until there is at least a default sorting function in place, which is quick such as by comparing strings?
Isn't the whole idea that there is no default sorting function? If you mean making convenient key functions readily available, then: yes. I expect we'll need them to fix doctests.
Comparing strings is not cheap at all because constructing the string representation is slow. It's better to stick with binary objects. Coefficient lists are much faster:
sage: L=[a^j for j in [1..100]]
sage: %timeit [str(l) for l in L]
100 loops, best of 3: 5.89 ms per loop
sage: %timeit [l.list() for l in L]
1000 loops, best of 3: 386 µs per loop
sage: %timeit [list(l) for l in L]
100 loops, best of 3: 9.92 ms per loop
sage: k=lambda l:l.list()
sage: %timeit [k(l) for l in L]
1000 loops, best of 3: 398 µs per loop
unfortunately, using "list(l)" is slow: it might construct an iterator out of l first, which would have much more overhead to get the whole list of coefficients than getting the list immediately. So either we optimize "list" so that sorted(L,key=list)
works or we make k
above available (possibly in a lower overhead version). Note that coefficient lists already come in the right order for power basis rep: integers look like [*,0,...,0]
.
Branch pushed to git repo; I updated commit sha1. New commits:
d664262 | trac 20028: first batch of fixes to pass doctests |
First list of changes. For me, number_field_element.pyx
now passes. I suspect that a lot of changes will be along the lines of the ones done here.
Branch pushed to git repo; I updated commit sha1. New commits:
77618de | second batch of doctest fixes |
Getting "rings" to pass was easy. In elliptic curves there seem to be some more assumptions about things being sortable that are not sortable. Some preliminary fixes put in place. Comments welcome.
In particular, please take a look at isogeny_small_degree. The approach I took for the first bit can most certainly be extended, but if you have strong opinions in another direction, then feel free to implement that.
Nils -- I am willing & happy to do my bit here but then we need to divide up the task. I'll look at the elliptic curves section first, but we need to split up the others too.
Question: when doing something like sorted(v,key=list) where v is a list of nf elements in a relative number field, I suppose that the coefficient list elements are themselves nf elements which must be sorted somehow? What is happening then?
Replying to @nbruin:
Getting "rings" to pass was easy. In elliptic curves there seem to be some more assumptions about things being sortable that are not sortable. Some preliminary fixes put in place. Comments welcome.
In particular, please take a look at isogeny_small_degree. The approach I took for the first bit can most certainly be extended, but if you have strong opinions in another direction, then feel free to implement that.
As you know from the original thread, one of my aims in starting this was to get the isogenies in a fixed order. Partly that is just so that we can doctest, since there is no intrinsic mathematical ordering on the set of isogenies from a given curve. For this it would suffice to arrange for the doctest outputs not to be order-sensitive which would be rather tedious but a Godd Thing in the long run (as I think Volker would agree).
In situations were I do care about the order of the curves in an isogeny class, that is something which can be achieved elsewhere, in the code for the isogeny class itself; the class could have one or more methods for sorting the curves in it, to be use by people who care (such as me).
I will do some work on this right now, just in sage/schemes/elliptic_curves, building on commit 77618de1e986c
Changed branch from u/nbruin/sorting_of_number_field_elements to u/cremona/20028
I finished with the elliptic curves section, and uploaded the modified branch as u/cremona/20028.
The principles I followed are:
Where there was sorting within the code, it is now always wrapped in a try/except block which result in no sorting when a NotImplementedError is raised. For cases where sorting is available there will be no change.
For the doctest outputs relating to isogenies, the outputs are now sorted using key=str except when the base field is Q, or where the list has at most one entry anyway. This required changing some doctest outputs, mostly just the same but in a different order. I may have overdone this in that over a finite prime field there was possibly sorting already in place making the sort with key=str unnecessary, but this is only doctest output anyway.
A couple of other places requred additions of the try/except wrapping around sort operations. This includes a place in the isogeny class code, on which I will do further work; interestingly the only doctests which fail without that particular change is the one which caused this in the first place (CM curves w.r.t an order of class number 6).
A full doctest now shows only the following:
sage -t --long src/sage/schemes/projective/projective_morphism.py # 1 doctest failed
-- caused by a sort in sage/schemes/affine/affine_homset.py, similarly for projective. There is some badly broken code in lines 3003-3011 of sage/schemes/projective/projective_morphism.py which I have not fixed!
sage -t --long src/sage/modular/modform/constructor.py # 2 doctests failed
-- caused by a sort() in sage/matrix/matrix2.pyx (line 4633) where a polynomial factorization was being sorted).
sage -t --long src/sage/modular/cusps_nf.py # 2 doctests failed
-- comparison by < of number field cusps is done by comparing the underlying nf elements. I just marked the tests # not tested.
sage -t --long src/sage/schemes/projective/projective_rational_point.py # 1 doctest failed
-- a list of points over a number field was being sorted.
which I have sorted -- sorry, fixed. Testing again.
Branch pushed to git repo; I updated commit sha1. New commits:
a8a0714 | #20028: fix most remaining doctest failures |
OK, so this commit fixes all remaining doctest failues with one exception, in src/sage/schemes/projective/projective_morphism.py. I am not sure how this one crept in. The main reason I have not fixed it is that around this place in the code (see COmment 14 above) there is some very badly written (wrong) code which should probably be fixed while we are looking at it.
git blame tells me that the wrong code was written by Grayson Jorgenson in March 2015 so I have CC'd him on this ticket.
Now would perhaps be a good time to get some other people to look at what we have been doing!
Thanks! Personally, I dislike the try: L.sort() except NotImplementedError: pass
blocks. Those sorts shouldn't have been there in the first place. I think people should just deal with the fact that some objects do not come in any particular order. It does seem the easiest way to ensure that most doctests do not change, which is the easier way of getting this stuff in. So I think we'll have to live with it.
It's sad that python first had __cmp__
which made it impossible to have eq/ne without pretending to have ordering.
Hi!
I think the failure in projective_morphism.py comes from projective_homset.py, where a set is now unexpectedly returned instead of a list in the points function for fields. The try block that was added in projective_homset.py prevents sorted(points)
from executing in line 138 and sorted
changes the type of points from a set to a list. Changing return points
to return list(points)
in line 141 seems to address the problem.
For the code in projective_morphism.py (lines 3003-3011) in the first line a list is declared but is never used (I think it is an artifact from a previous implementation attempt that wasn't caught). As far as I can tell, the rest of the code is working properly. Is there a problem with its functionality that I've missed?
Replying to @sagetrac-gjorgenson:
Hi!
Thanks a lot for responding.
I think the failure in projective_morphism.py comes from projective_homset.py, where a set is now unexpectedly returned instead of a list in the points function for fields. The try block that was added in projective_homset.py prevents
sorted(points)
from executing in line 138 andsorted
changes the type of points from a set to a list. Changingreturn points
toreturn list(points)
in line 141 seems to address the problem.
OK, we'll do that. (It is odd to me that sorted(X) has the side effect of changing a set to a list, but now that we know that we just make it into a list regardless of whether we are able to sort it.)
I'll update the branch on this ticket accordingly.
For the code in projective_morphism.py (lines 3003-3011) in the first line a list is declared but is never used (I think it is an artifact from a previous implementation attempt that wasn't caught). As far as I can tell, the rest of the code is working properly. Is there a problem with its functionality that I've missed?
Sure, rem_indices is not used. But also: the line P=self(P) is executed man ytimes (once for each j) and in the next line should it not be if p++points[j]? Otherwise j is not used. I am also worried about the effect of popping points[i] while in the i-loop, but perhaps that's OK given that the i loop goes backwards. A comment to explain what this block of code is doing would help, since I suspect that someone might come up with a slicker solution. (If we only wanted to remove duplicates and did not care about the order then something like points=list(set(points)) would work.)
It's a nuisance that any fix to this will conflict with our sorting changes, but we'll have to live with that.
Replying to @nbruin:
Thanks! Personally, I dislike the
try: L.sort() except NotImplementedError: pass
blocks. Those sorts shouldn't have been there in the first place. I think people should just deal with the fact that some objects do not come in any particular order. It does seem the easiest way to ensure that most doctests do not change, which is the easier way of getting this stuff in. So I think we'll have to live with it.
I agree that these are ugly. We can either try to get the current patch reviewed, or look to see how bad things would be if we were to remove those. I will start on that for elliptic curves, but do not want to delay progress.
After removing these try/except blocks in sage/schemes/elliptic_curves:
sage -t src/sage/schemes/elliptic_curves/cm.py # 2 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_torsion.py # 2 doctests failed
sage -t src/sage/schemes/elliptic_curves/isogeny_small_degree.py # 3 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_modular_symbols.py # 3 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_field.py # 8 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_generic.py # 2 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_point.py # 5 doctests failed
sage -t src/sage/schemes/elliptic_curves/isogeny_class.py # 41 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_number_field.py # 32 doctests failed
sage -t src/sage/schemes/elliptic_curves/ell_rational_field.py # 6 doctests failed
which I hope can be resolved easily....nearly done... the files with lots of failures are becuase I deleted one line too many (forgot to set self.curves in the isogeny class code over number fields) so in fact it is not so bad at all).
It's sad that python first had
__cmp__
which made it impossible to have eq/ne without pretending to have ordering.
Replying to @JohnCremona:
It is odd to me that sorted(X) has the side effect of changing a set to a list
The wonders of python's ducktyping: "sorted" accepts an iterable and returns the contents of the iterable as a sorted list. A finite set is an iterable. It would certainly be useless if it returned a set, since then the sorting work was all for naught (sets are hash tables underneath in python, so they really don't preserve any kind of order).
but now that we know that we just make it into a list regardless of whether we are able to sort it.
Uhm. I don't have a problem returning things in a list in no particular order, but actively turning something that comes naturally in a set into a list just for the sake of it sounds wrong. If the routine you're getting your data from returns a set, perhaps you should too? or perhaps that routine benefits from returning a list instead? This doesn't necessarily need to be resolved here, but bear in mind that postponing a cleaner resolution is the process through which cruft builds up in sage.
EDIT: OK, I see now what's going on. There's a lot unnecessary work: Once you've found the points on one affine patch, you should determine the points on the hyperplane at infinity (and repeat the process there). Then there's no reason for "points" to be a set in the first place, because you'll be finding them without duplication.
Replying to @JohnCremona:
Sure, rem_indices is not used. But also: the line P=self(P) is executed man ytimes (once for each j) and in the next line should it not be if p++points[j]? Otherwise j is not used. I am also worried about the effect of popping points[i] while in the i-loop, but perhaps that's OK given that the i loop goes backwards. A comment to explain what this block of code is doing would help, since I suspect that someone might come up with a slicker solution. (If we only wanted to remove duplicates and did not care about the order then something like points=list(set(points)) would work.)
That code doesn't try to remove duplicates (the list of points returned by rational_points in line 2998 is assumed to contain no duplicates already), but is trying to weed out periodic points of the map self
that have periods smaller than n so that only those with minimal period n are returned. This is done by iterating each point in the list by self
(in the j-loop) n-1 times and checking whether any of those iterates is the same point as the 0th iterate (points[i]). j is not used for indexing but is needed to keep track of how many times to iterate each point before it can be concluded that the point has minimal period n.
The problems associated with popping elements from the list are avoided since the i-loop moves backwards. I'm not sure of a way to do this more efficiently, but it would definitely be good to at least add some more documentation to the code. Would that be appropriate to do in this ticket, or would it be cleaner to leave it to a separate ticket?
Branch pushed to git repo; I updated commit sha1. New commits:
ae87261 | removed sorting in elliptic curve code, use sorted() in doctests instead |
The last version of u/cremona/20028 (commit ae87261) removes the try/except sorting blocks and deals with the issues which that creates in the elliptic curves directory. Mostly this involves adding sorted() to some doctests, and reordering the expected output. In the case of isogeny classes I used the existing facility for reordering an isogeny class based on a new ordering for the curve in it; this is a stop-gap until we have better sorting methods implemented for isogenies (and for n.f. elements).
I am now looking at the pther parts of Sage to make the necessary changes there.
Also commenting on the projective_morphism code line 3002-3014 as I helped review that: Except for the unused list variable rem_indices this code is doing exactly what is it supposed to:
Given a list of periodic points get rid of any that have period less than n, i.e., if you want the minimal 4 periodic points, go through and remove the fixed points and the 2 periodic points.
Hence the variable j is walking through the iterates (up to n) of each point indexed by i. Since the list is traversed in reverse order, the pop doesn't break it. I'm not saying that there isn't a better way to do this, but the code does do its intended task. Perhaps some additional code comments are in order there so that people can more easily decipher it.
Replying to @bhutz:
Also commenting on the projective_morphism code line 3002-3014 as I helped review that: Except for the unused list variable rem_indices this code is doing exactly what is it supposed to:
Given a list of periodic points get rid of any that have period less than n, i.e., if you want the minimal 4 periodic points, go through and remove the fixed points and the 2 periodic points.
Hence the variable j is walking through the iterates (up to n) of each point indexed by i. Since the list is traversed in reverse order, the pop doesn't break it. I'm not saying that there isn't a better way to do this, but the code does do its intended task. Perhaps some additional code comments are in order there so that people can more easily decipher it.
Ben, I can only apologise again for doubting the code hen I just misunderstood it (and specifically what self(P) did). At some point when this ticket is finished, perhaps adding some comments and removing the redundant line would be a good idea, but there is no hurry. (Feel free to open a ticket for that already.)
Nils, can you explain the following? Starting with the most recent commit above, if you make one change to sage/structure/factorization.py
@@ -319,10 +319,7 @@ class Factorization(SageObject):
self.__unit = unit
self.__cr = cr
if sort and self.is_commutative():
- try:
- self.sort()
- except NotImplementedError:
- pass
+ self.sort()
then there are many failures, for example
sage: pol26 = hilbert_class_polynomial(-4*26)
sage: K = NumberField(pol26,'a')
sage: K.optimized_representation()
<boom>
NotImplementedError: root finding for this polynomial not implemented
no problem. projective morphism issue opened as #20059.
Replying to @JohnCremona:
Nils, can you explain the following?
... then there are many failures, for example
sage: pol26 = hilbert_class_polynomial(-4*26) sage: K = NumberField(pol26,'a') sage: K.optimized_representation() <boom> NotImplementedError: root finding for this polynomial not implemented
I am not quite sure what you mean by "explain". Do you want me to give my interpretation of the traceback that shows where the failure occurs? In that case:
The error happens in sage.rings.polynomial.polynomial_element.pyx:6533
, in Polynomial.roots
. The relevant branches are:
try:
if K.is_integral_domain():
...
return self._roots_from_factorization(self.factor(), multiplicities)
else:
raise NotImplementedError
except NotImplementedError:
...
raise NotImplementedError("root finding for this polynomial not implemented")
With the change, "self.factor()" indirectly raises NotImplementedError
from the sort
. The really appropriate solution would be to just not sort in self.factor
at all, because in general there's no way to sort. But I suspect the number of doctests that need to be adjusted would be very unpleasant. So I'd go with the "best effort sorting attempt" of a try
here.
You can see two things here:
raise
unless you have a very good reason not to. Getting the original error here is way more informative (and cheaper!) than this newly constructed error.One option for factor
would be to make the sort
keyword tri-state: True/False/Try
, where False is no sort, True is sort (and error if that doesn't work) and Try (which would be the default for backwards compatibility) to try and be quiet about failure to sort. Then people can ask for sort=True
if they want to rely on the answer being in a particular order. The sort=Try
option would have no value other than giving a little nicer output in some cases. This may well be overengineering and it's clear what would happen: everybody would just use the default "Try" and rely on the order being the same every time (and be surprised if it's not).
Thanks, that does help clarify what is happening. I'll keep in the try/except here since otherwise (as you say) a vast number of pther changes would be needed. I think that the half dozen other palces where we added try/except around sort attempts will not be so problematical.
I hope that after this work is done we'll be able to convince others that the change is worth doing!
Branch pushed to git repo; I updated commit sha1. New commits:
2016dd3 | #20028: most doctests now pass |
Replying to @JohnCremona:
Thanks, that does help clarify what is happening. I'll keep in the try/except here since otherwise (as you say) a vast number of other changes would be needed. I think that the half dozen other places where we added try/except around sort attempts will not be so problematical.
I have almost finished this, and it was not so hard. A few doctests relied on the list of solutions to some equations and these now need sorted() in the doctest since thesolutions are not automatically sorted. Apart from that, there were several places where lack of sorting of real & complex roots of a polynomial led to permuting the real and complex embeddings of a number field; this was easy to fix since now the functions K.real_mebeddings() and K.complex_embeddings() use the sorting in RR and CC to sort themselves, which dealt with all such problems. (In the elliptic curve period lattice code which has a ridiculous number of doctests, by me, I made the test more consistent by always using the above functions instead of sometimes using K.embeddings(RR) and K.embeddings(CC), since K.embeddingss() does not sort).
Two problems remain, only. (1) the file src/sage/tests/french_book/mpoly.py where I was not sure of the etiquette of changing doctest input/output since presumably the book text might need changng too. (2) one failing test in src/sage/schemes/elliptic_curves/ell_point.py and two in src/sage/schemes/elliptic_curves/period_lattice.py are causd by a seprate bug which previously struck me in ticket #18336 which is still not resolved (see there for what I am talking about). If we can fix that then both this ticket and that one could be finished off.
The new commit is uploaded.
I hope that after this work is done we'll be able to convince others that the change is worth doing!
New commits:
2016dd3 | #20028: most doctests now pass |
I made a separate ticket for the bug in QQbar.sqrt: #20064.
Until that is resolved we cannot set this ticket to needs_review, but I would like to get it reviewed!
Dependencies: #20064
Replying to @JohnCremona:
Question: when doing something like sorted(v,key=list) where v is a list of nf elements in a relative number field, I suppose that the coefficient list elements are themselves nf elements which must be sorted somehow? What is happening then?
I don't think key=list
is a valid way of sorting for relative number fields (unless the base field is embedded and inherits an ordering from that).
Are you wondering whether we should have a "generic" key function other than "str"? I think that would require another protocol, where objects are required to have a __default_sorting_key__
method or something similar, and probably a access function default_sorting_key
that would recursively walk tuples etc (but not sets unless we'd have a way to symmetrize the keys) and call this method on relevant objects.
My feeling is that the need for such a function is not so pressing just yet, and that we're better off with people having to think about what key they use every time they need one (and in the process perhaps recognize when they don't need sorting).
Yes, I agree with that. It has been quite instructive to see that for most use cases the only reason we had been sorting at all was for doctest consistency, and for those, using key=str sufficed. The exceptions all (I think) are for the cases I care about a lot for the purpose of cataloguing and labelling elliptic curves, and here I'll be writing custom (and properly documented) sorting functions.
So we don't need default sorting method(s) for number field elements. I hope others agree.
I see this ticket hasn't received much attention lately. Should we move this along?
Also, I've noticed that _get_embedding_approx(i)
gives an precision of 53*2i bits, so the original code (starting i at one and incrementing i) was correct. It's already doubling precision.
OK, so a few months have passed and all the work we did for this has so far not come to anything. Meanwhile I started using sort(..., key=lambda a: a.list()) for my own use (with elements of number fields, where I am happy that I will always use the same generating polynomial for each field so this will be consistent). Because of the above it has been less important for me to have Sage-wide sorting of nf elements fixed, but of course it should be.
Now after the switch to python3, this ticket should be revisited.
Description changed:
---
+++
@@ -1 +1,10 @@
See the discussion at https://groups.google.com/forum/#!topic/sage-devel/aP5LP09sADY
+
+```
+sage: K.<a> = NumberField(x^3+x+1)
+sage: K(1) > K(2), K(2) > K(1)
+(True, True)
+sage: K(1) < K(2), K(2) < K(1)
+(False, False)
+```
+
See the discussion at https://groups.google.com/forum/#!topic/sage-devel/aP5LP09sADY
Depends on #20064
CC: gjorgens@math.fsu.edu @sagetrac-jakobkroeker @kliem @videlec @fchapoton @antonio-rojas
Component: number fields
Keywords: sort number field elements
Branch/Commit: u/cremona/20028 @
2016dd3
Issue created by migration from https://trac.sagemath.org/ticket/20028