unrank via R[i] conflicts with notation for constructing polynomial rings in CartesianProduct

darijgr commented 10 years ago

FF = IntegerModRing(29)
tester = FF._tester()
list(tester.some_elements(CartesianProduct(FF, FF, FF)))

This should give a list of 3-arrays of elements of Z/29. Instead I get:

sage: FF = IntegerModRing(29)
sage: tester = FF._tester()
sage: list(tester.some_elements(CartesianProduct(FF, FF, FF)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-5d7e71a13340> in <module>()
----> 1 list(tester.some_elements(CartesianProduct(FF, FF, FF)))

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/misc/sage_unittest.pyc in some_elements(self, S)
    498             if n > self._max_runs:
    499                 from random import sample
--> 500                 S = sample(S, self._max_runs)
    501         except (TypeError, AttributeError):
    502             # We already can't tell what the length of n is, so

/home/darij/gitsage/sage-5.13.beta1/local/lib/python/random.pyc in sample(self, population, k)
    342                         j = _int(random() * n)
    343                     selected_add(j)
--> 344                     result[i] = population[j]
    345             except (TypeError, KeyError):   # handle (at least) sets
    346                 if isinstance(population, list):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/combinat/combinat.pyc in __getitem__(self, i)
   1220             ValueError: the value must be between 0 and 2 inclusive
   1221         """
-> 1222         return self.unrank(i)
   1223 
   1224     def __str__(self):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/combinat/cartesian_product.pyc in unrank(self, x)
    258             raise IndexError("x larger than the size of the Cartesian Product")
    259         positions.reverse()
--> 260         return [L[i] for L,i in zip(self.iters, positions)]
    261 
    262     def random_element(self):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent.so in sage.structure.parent.Parent.__getitem__ (sage/structure/parent.c:11060)()

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/categories/rings.pyc in __getitem__(self, arg)
    859 
    860             from sage.rings.polynomial.polynomial_ring_constructor import PolynomialRing
--> 861             return PolynomialRing(self, elts)
    862 
    863     class ElementMethods:

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/rings/polynomial/polynomial_ring_constructor.pyc in PolynomialRing(base_ring, arg1, arg2, sparse, order, names, name, var_array, implementation)
    467                 raise TypeError("if second arguments is a string with no commas, then there must be no other non-optional arguments")
    468             name = arg1
--> 469             R = _single_variate(base_ring, name, sparse, implementation)
    470         else:
    471             # 2-4. PolynomialRing(base_ring, names, order='degrevlex'):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/rings/polynomial/polynomial_ring_constructor.pyc in _single_variate(base_ring, name, sparse, implementation)
    518 def _single_variate(base_ring, name, sparse, implementation):
    519     import sage.rings.polynomial.polynomial_ring as m
--> 520     name = normalize_names(1, name)
    521     key = (base_ring, name, sparse, implementation if not sparse else None)
    522     R = _get_from_cache(key)

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent_gens.so in sage.structure.parent_gens.normalize_names (sage/structure/parent_gens.c:2759)()

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent_gens.so in sage.structure.parent_gens._certify_names (sage/structure/parent_gens.c:2322)()

ValueError: first letter of variable name must be a letter

And 29 is the smallest number where this seems to throw this error; also, it doesn't fail if I only take the cartesian product of two copies of the field.

I get a similar error if I do CartesianProduct(FF, FF).unrank(3), and this has been around for a while (not just since beta4):

sage: FF = IntegerModRing(3)
sage: CartesianProduct(FF, FF).unrank(3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-98d0a571cc61> in <module>()
----> 1 CartesianProduct(FF, FF).unrank(Integer(3))

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/combinat/cartesian_product.pyc in unrank(self, x)
    258             raise IndexError("x larger than the size of the Cartesian Product")
    259         positions.reverse()
--> 260         return [L[i] for L,i in zip(self.iters, positions)]
    261 
    262     def random_element(self):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent.so in sage.structure.parent.Parent.__getitem__ (sage/structure/parent.c:11060)()

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/categories/rings.pyc in __getitem__(self, arg)
    859 
    860             from sage.rings.polynomial.polynomial_ring_constructor import PolynomialRing
--> 861             return PolynomialRing(self, elts)
    862 
    863     class ElementMethods:

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/rings/polynomial/polynomial_ring_constructor.pyc in PolynomialRing(base_ring, arg1, arg2, sparse, order, names, name, var_array, implementation)
    467                 raise TypeError("if second arguments is a string with no commas, then there must be no other non-optional arguments")
    468             name = arg1
--> 469             R = _single_variate(base_ring, name, sparse, implementation)
    470         else:
    471             # 2-4. PolynomialRing(base_ring, names, order='degrevlex'):

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/rings/polynomial/polynomial_ring_constructor.pyc in _single_variate(base_ring, name, sparse, implementation)
    518 def _single_variate(base_ring, name, sparse, implementation):
    519     import sage.rings.polynomial.polynomial_ring as m
--> 520     name = normalize_names(1, name)
    521     key = (base_ring, name, sparse, implementation if not sparse else None)
    522     R = _get_from_cache(key)

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent_gens.so in sage.structure.parent_gens.normalize_names (sage/structure/parent_gens.c:2759)()

/home/darij/gitsage/sage-5.13.beta1/local/lib/python2.7/site-packages/sage/structure/parent_gens.so in sage.structure.parent_gens._certify_names (sage/structure/parent_gens.c:2322)()

ValueError: first letter of variable name must be a letter

So it seems that both bugs are due to the fancy R[x] syntax for polynomial rings conflicting with the R[i] syntax for the i-th element of an enumerated set R, and #8389 didn't cause the error, but at most exposed it better.

This reworks the unrank function in sage.combinat.ranker and uses it for the Cartesian product (this doesn't work for ZZ, see #16239). In followup tickets, we also want to be more systematic about using unrank in TestSuite and do better construction of some_elements() #16244.

CC: @mezzarobba @sagetrac-sage-combinat

Component: algebra

Keywords: notation, algebra, polynomial

Author: Nicolas M. Thiéry

Branch/Commit: 79d4698

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/15919

nbruin commented 10 years ago

comment:2

I think the bug is in unrank, not in the fact that rings use __getitem__ in a funny way. The problem is that a Cartesian product can easily be constructed from iterables, and then should probably be iterable itself, but iterables are not necessarily indexable:

sage: V={1,2,3}
sage: CartesianProduct(V,V).unrank(2)
TypeError: 'set' object does not support indexing
sage: [i for i in CartesianProduct(V,V)]
[[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]

The function "unrank" should only be applied to indexable cartesian products, i.e., cartesian products of indexables. That's not the same as iterables. In particular,

So the problem is really in tester.some_elements. Indeed, the offending code:

        try:
            n = _len(S)
            if n > self._max_runs:
                from random import sample
                S = sample(S, self._max_runs)
        except (TypeError, AttributeError):

tries sample on S, which apparently leads to calls of the type S[i]. As you can see, the code is prepared to fail, but doesn't expect that to happen with a ValueError. I think this just shows that it's a bit callous to just try and index a structure and hope for the best.

darijgr commented 10 years ago

comment:3

Hmm. So Sets are not required to define getitem in a reasonable way? Good to know.

mezzarobba commented 10 years ago

comment:4

Nicolas Thiéry suggested deprecating R[n] in favor of R.unrank(n) at least when R is a ring (see #8389 comment:13), and I tend to agree. If you don't, could you please leave a comment at #15885?

darijgr commented 10 years ago

comment:5

I agree too if we can do this quickly enough. This is causing problems in #10963...

nthiery commented 10 years ago

Branch: u/nthiery/ticket/15919

nthiery commented 10 years ago

Commit: a4f7fc3

nthiery commented 10 years ago

comment:7

The above branch takes a first step toward using more systematically the unrank protocol, in that case in CartesianProduct. This should improve the situation and could be deemed good enough for now, but is certainly not a final solution. Also, the unrank function I introduced might be too paranoid: it makes CartesianProduct.unrank work only with lists, tuples, Parents, xranges; there might be other inputs that are used around. All tests pass though (haven't tried long tests yet).

I agree with Nils though: the sampling in the TestSuite should not be using [i] to do unranking, and probably should just assume that the object is iterable, and not necessarily unrankable. I'll have a look unless someone beats me to it.

The third alternative, explored by Darij in his branch, is to reinstate temporarily the notation FF[i] for unranking for IntegerModRing and friends.

What do you think?

New commits:

a4f7fc3 15919: a first step toward using more systematically the unrank protocol

tscrim commented 10 years ago

Changed commit from a4f7fc3 to d9b5fcd

tscrim commented 10 years ago

Changed branch from u/nthiery/ticket/15919 to u/tscrim/ticket/15919

tscrim commented 10 years ago

comment:9

Hey Nicolas,

I've expanded upon what unrank can accept (all iterables, even if not in enumerated sets, should now work). I agree the sampling should not use [i] to do unranking, but should we push that to another ticket?

New commits:

`5dfb2a2`	`Merge branch 'u/nthiery/ticket/15919' of trac.sagemath.org:sage into u/tscrim/ticket/15919`
`9b8d022`	`Expanded what unrank can handle.`
`d9b5fcd`	`Added warning about iterator objects.`

pjbruin commented 10 years ago

comment:10

Not a review, just a tiny remark: the colon before the doctest in cartesian_product.py should be doubled.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from d9b5fcd to 2a5eaa5

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2a5eaa5 Fixed double-colon.

nthiery commented 10 years ago

comment:12

Hi Travis!

Thanks for the improvement! I made one further step, in particular to avoid trapping too many exceptions; typical situation: a parent L that implements unranking as L[i] and raises an "out of bound" value error.

I am still unhappy about the unrank(ZZ,i) returning something completely wrong (Integer Ring). In case a parent is iterable, I wonder if I would not prefer to prefer the Iterable strategy, and only use __getitem__ if everything else has failed.

Cheers, Nicolas

nthiery commented 10 years ago

Changed branch from u/tscrim/ticket/15919 to u/nthiery/ticket/15919

nthiery commented 10 years ago

comment:14

Replying to @tscrim:

I agree the sampling should not use [i] to do unranking, but should we push that to another ticket?

+1. Let's focus here on what's needed for #10963.

New commits:

`f9b59fa`	`Merge branch 'develop=6.2.rc0' into ticket/15919-unrank`
`bbd22f1`	- Use `__getitem__` only on sequences and Parents not in EnumeratedSets

nthiery commented 10 years ago

Changed commit from 2a5eaa5 to bbd22f1

nthiery commented 10 years ago

comment:15

Ah shoot, the sampling issue actually occurs with #10963.

At this point I am tempted to remove the sampling altogether in InstanceTester.some_elements, and instead have it always return a list (or iterator?) of the first max_run elements (or less if S does not have that many). Then, the only requirement on S would be to be iterable.

The code would then just be:

    def some_elements(self, S=None):
        if S is None:
            if self._elements is None:
                S = self._instance.some_elements()
            else:
                S = self._elements
        import itertools
        return list(itertools.islice(S,0,self._max_runs))

I am now running the tests with #10963 and this change.

Cheers, Nicolas

nthiery commented 10 years ago

comment:16

With just this change (and not the rest of the stuff in this branch), #10963 passes all tests.

At this point, I am tempted to split this ticket into two tickets:

One about fixing TestSuite/InstanceTester.some_elements to not use indexed access (sufficient for #10963)
One toward using the unrank protocol more systematically (in that case in CartesianProduct)

What do you think?

tscrim commented 10 years ago

comment:17

(Much) More of me says this change to InstanceTester.some_elements() is significantly better (and might even be faster for things that are hard to repeatably iterate over like crystals) than loosing the possibility that (over enough repeated tests) we get "most" elements of a very large finite set (and without calculating __len__(), which also might be expensive or error catching). So I'm saying let's make this change, the question is where. I'd be happy doing it right here.

I'd say the problem with ZZ is that it is not in the correct category, which is now #16239. I've also made some very minor doctweaks, so I'm pos_rev with the current state (up to where to make the change above).

Best,

Travis

tscrim commented 10 years ago

New commits:

`5d2d335`	`Merge branch 'develop' into u/tscrim/ticket/15919`
`382b9ff`	`Merge branch 'u/nthiery/ticket/15919' of trac.sagemath.org:sage into u/tscrim/ticket/15919`
`79d4698`	`Some last very minor tweaks to ranker.py.`

tscrim commented 10 years ago

Changed branch from u/nthiery/ticket/15919 to u/tscrim/ticket/15919

tscrim commented 10 years ago

Changed commit from bbd22f1 to 79d4698

nthiery commented 10 years ago

comment:19

Replying to @tscrim:

(Much) More of me says this change to InstanceTester.some_elements() is significantly better (and might even be faster for things that are hard to repeatably iterate over like crystals) than loosing the possibility that (over enough repeated tests) we get "most" elements of a very large finite set (and without calculating __len__(), which also might be expensive or error catching). So I'm saying let's make this change, the question is where. I'd be happy doing it right here.

Ok, glad to see we are on the same line. I have created #16244 for just this piece. I'll push my little change there later.

I'd say the problem with ZZ is that it is not in the correct category, which is now #16239. I've also made some very minor doctweaks, so I'm pos_rev with the current state (up to where to make the change above).

Ok, I am happy with your change. So I guess this can be positive reviewed. There remains to update the description of this ticket to better reflect what has actually been done, and refer to the other ticket for the actual solution of the original issue.

tscrim commented 10 years ago

Description changed:

--- 
+++ 
@@ -121,3 +121,5 @@
 ValueError: first letter of variable name must be a letter

So it seems that both bugs are due to the fancy R[x] syntax for polynomial rings conflicting with the R[i] syntax for the i-th element of an enumerated set R, and #8389 didn't cause the error, but at most exposed it better. + +This reworks the unrank function in sage.combinat.ranker and uses it for the Cartesian product (this doesn't work for ZZ, see #16239). In followup tickets, we also want to be more systematic about using unrank in TestSuite and do better construction of some_elements() #16244.

tscrim commented 10 years ago

comment:20

Done and done.

tscrim commented 10 years ago

Reviewer: Travis Scrimshaw

tscrim commented 10 years ago

Author: Nicolas M. Thiéry

nthiery commented 10 years ago

comment:22

Thanks Travis!

darijgr commented 10 years ago

comment:23

The only thing that somewhat worries me is the speed. Before:

sage: T = range(20)
sage: C = CartesianProduct(T,T,T)
sage: %timeit C.unrank(3)
100000 loops, best of 3: 15 µs per loop

After:

sage: T = range(20)
sage: C = CartesianProduct(T,T,T)
sage: %timeit C.unrank(3)
10000 loops, best of 3: 35.5 µs per loop

(These timings are fairly independent on the 20 and the 3...)

I don't know how often this unranking is used, though, and whether these µs add up...

nthiery commented 10 years ago

comment:24

Replying to @darijgr:

The only thing that somewhat worries me is the speed. Before: ... I don't know how often this unranking is used, though, and whether these µs add up...

Ah good point; the isinstance and in EnumeratedSets checks cause a little overhead which is non negligible for inputs that have super fast unranking (like lists).

My impression is that this unranking is not used much. If it really became a bottleneck, a reasonable fix would be to have CartesianProduct build unranking functions once for all upon the first unrank (I started toying with this in u/nthiery/15919-unrank-unrank-from), so that the above checks would be done only once.

Besides, things should eventually resolve by themselves as CartesianProduct gets progressively replaced by cartesian_product (for which the inputs are parents, or coerced into parents), and parents implement more systematically the unrank protocol.

So altogether, in this balance between overhead and correctness, I would lean here toward correctness. What do you think?

darijgr commented 10 years ago

comment:25

I am not suggesting to go against correctness! My original idea on this ticket was to de-prioritize the notation R[x] for a polynomial ring or a subalgebra generated by x, because that notation is really syntactic sugar. So it's speed vs. convenience. But looking at Travis's posts I am no longer sure if this would have fixed all issues. Anyway, since unrank and CartesianProduct aren't very much used together, this is not an issue.

nthiery commented 10 years ago

comment:26

Replying to @darijgr:

I am not suggesting to go against correctness!

Oops, sorry if my wording involuntarily implied this :-) I just meant that I was ready paying a little speed overhead as a price for a solution of the problem. And I agree the problem worked on this ticket (being cleaner with unrank versus __getitem__) has deviated from the original one. Not so bad since I believe this + #16244 fixes reasonably the original problem.

Unless someone raises an objection now, this is good to go!

Thanks to both of you.

vbraun commented 10 years ago

Changed branch from u/tscrim/ticket/15919 to 79d4698

sagemath / sage

unrank via R[i] conflicts with notation for constructing polynomial rings in CartesianProduct #15919