sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.41k stars 475 forks source link

make it possible to generate sequences of variables easily #11576

Open kcrisman opened 13 years ago

kcrisman commented 13 years ago

People are always asking how to get sequences of variables, like a1,a2,a3,a4 or the like. See this ask.sagemath.org question, for example.

Jason Grout has an interesting possible solution that should find a home somewhere in Sage (see sage-support discussion).

class VariableGenerator(object):
     def __init__(self, prefix):
         self.__prefix = prefix
     @cached_method
     def __getitem__(self, key):
         return SR.var("%s%s"%(self.__prefix,key))

Now just specify a prefix, and then you can index to your heart's content to generate variables.

a = VariableGenerator('a') # some people may like 'a_' as the prefix
a[0], a[1], a[2] # all variables

Of course, this can easily be extended to using function call syntax: a(0), or to using multiple indices: a[1,3]. Indeed, you can let your imagination run wild and even do things like return full symbolic matrices or vectors with slices: a[0:5, 0:5].

Apply attachment: trac_11576-indexed_expression.patch

CC: @jasongrout @gvol @orlitzky @eviatarbach @slel

Component: symbolics

Keywords: Cernay2012

Issue created by migration from https://trac.sagemath.org/ticket/11576

burcin commented 13 years ago
comment:1

GiNaC already has indexed expressions. We should try to use what is there instead of coming up with a pure python solution ourselves. This was discussed on sage-devel a while ago, I even put up an experimental patch for it:

http://groups.google.com/group/sage-devel/browse_thread/thread/69ab50fe11672111

kcrisman commented 13 years ago
comment:2

Thanks, Burcin! I must have missed this somehow.

Here is a link to the patch.

From that thread:

Note that GiNaC cannot take derivatives of indexed expressions. 

So that means one couldn't say that diff(a[1],a[1])==1? That would seem to be unfortunate.

By the way, note that Jason was just trying to find a way to make lots of GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,

burcin commented 13 years ago
comment:3

Replying to @kcrisman:

From that thread:

 Note that GiNaC cannot take derivatives of indexed expressions. 

So that means one couldn't say that diff(a[1],a[1])==1? That would seem to be unfortunate.

I don't have time to test this right now, but if it really doesn't work, we could fix it or ask the GiNaC developers why they chose to omit this.

By the way, note that Jason was just trying to find a way to make lots of GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,

It would be useful to extend the var() syntax to support creating variables like 'a1, a2, ..., a10'. We should keep in mind the signature of other constructors like PolynomialRing() if we attempt this.

If we go further and try to support vectors and matrices with symbolic entries, it would be better to use the underlying GiNaC functionality. There is an example at the end of this page:

http://www.ginac.de/tutorial/Indexed-objects.html

If you can propose a basic user interface, I'd be happy to help with at least the first steps of interfacing C++.

nbruin commented 13 years ago
comment:4

Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of

sage: for i in range(10^8):
....:    l=1+SR.symbol("a%s"%str(i))

Perhaps GiNaCs indexed variables are less prone to leaking?

hivert commented 12 years ago

Attachment: indexed_expression-experimental-fh.patch.gz

hivert commented 12 years ago
comment:6

Hi there,

As as said on sage-devel, I started to work on Burcin patch. My goal was to be able to index a variable by any Sage object (eg: integers, group element, matrices...). So I extended a little Burcin patch. Now I'm faced with the problem that GiNaC indexed variable seems to have a not trivial semantic that I don't understand at all. Moreover I don't know how to debug Cython / C++ code so I'm quite stuck. Here is the problem I have.

To get it you should apply attachment: indexed_expression-experimental-fh.patch on a fresh Sage-4.7.2 install. Then I can

sage: m1 = matrix([1,2]); m1.set_immutable()
sage: m2 = matrix([2,1]); m2.set_immutable()
sage: a = x.ind[m1]; b = 2*x.ind[m2]
sage: a + a
2*x.[1 2]
sage: a + b
x.[1 2] + 2*x.[2 1]

which is, in my view very cool !

However strange thing has occurred under the hood, in particular for a strange reason I don't understand Sage added the two matrices ! This is apparent in

sage: a = x.ind[Permutation([3,2,1])]; b = 2*x.ind[1]
sage: a+b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>()

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/element.so in sage.structure.element.RingElement.__add__ (sage/structure/element.c:11488)()

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/symbolic/expression.so in sage.symbolic.expression.Expression._add_ (sage/symbolic/expression.cpp:11082)()
   2186                            relational_operator(_right._gobj))
   2187         else:
-> 2188             x = gadd(left._gobj, _right._gobj)
   2189         return new_Expression_from_GEx(left._parent, x)
   2190 

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/element.so in sage.structure.element.RingElement.__sub__ (sage/structure/element.c:11816)()

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/coerce.so in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:7489)()

TypeError: unsupported operand parent(s) for '-': '<class 'sage.combinat.permutation.Permutation_class'>' and 'Integer Ring'

To make thing crystal clear (I hope):

sage: class bla(SageObject):
...       def __add__(self, other):
...           print "Addition called"
sage: a, b = x.ind[m1],2*x.ind[m2]
sage: m1 = bla()
sage: m2 = bla()
sage: m1 + m2
Addition called
sage: m1*m2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>()

TypeError: unsupported operand type(s) for *: 'bla' and 'bla'

WTF ! Why is it adding or multiplying my indices for nothing ! It is a problem of Ginac ? of the wrapper ? or behind the chair and the screen ?

Any help greatly appreciated !

Cheers,

Florent

PS: crosspost on sage-devel.

burcin commented 12 years ago

Changed keywords from none to Cernay2012

orlitzky commented 11 years ago
comment:9

Allow me to throw my entry into the ring.

It's not nearly as ambitious as Burcin & Florent's patches, but it does scratch the itch that most people have, and it's working and tested already.

kcrisman commented 11 years ago
comment:10

I like a lot of it. Obviously needs tests in the new class. A few questions, which may betray ignorance:

orlitzky commented 11 years ago

Add tests for the individual class methods

orlitzky commented 11 years ago
comment:11

Attachment: sage-trac_11576.patch.gz

Replying to @kcrisman:

I like a lot of it. Obviously needs tests in the new class.

I just updated the patch, each of the individual class methods now have their own examples.

A few questions, which may betray ignorance:

  • Is it okay that we don't overwrite the global variables? I like that in principle, but there is similar confusion at times with PolynomialRings that have an x which isn't in SR while sage: x still returns the symbolic x. Just asking.

Having two different x-things floating around -- one a python variable and one a symbol name -- is a little bad, but clobbering them is in my opinion worse. Essentially what I'm trying to do is make this act like SR.symbol() instead of SR.var().

The symbol() function leaves the variables alone:

sage: y = 7    
sage: SR.symbol('y')
y
sage: y
7

Now there's a symbol floating around somewhere called y, but it isn't the python variable. The alternative is, in my opinion, much worse:

sage: y = 7         
sage: var('y')
y
sage: y
y

You could argue that people expect that from var() these days, but they certainly wouldn't expect it from indexing some object. So this is what I don't want:

sage: y1 = 7             
sage: ys = SR.symbols('y')
sage: foo = ys[1]
sage: y1
y1

The user never even wrote 'y1' anywhere, so I think it's dangerous to overwrite it.

  • One thing I don't like, or need explanation for, is the confusion in multi-index situations between the string (not LaTeX) reps of x_{1}_{1} and x_{11}, which are apparently both x11. It's almost too flexible, because you don't have to say how many indices there will be ahead of time - which is useful, of course, but there might have to be something with the reps.

You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:

sage: a = SR.symbols('a')
sage: bool(a[11] == a[1,1])
True

The next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?

  • Should SymbolSequence inherit from something?

I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?

orlitzky commented 11 years ago

Use underscores to separate indices (applies on top of previous patch)

orlitzky commented 11 years ago
comment:12

Attachment: underscores.patch.gz

If you want to see what it would look like with underscores, the patch I just posted adds them and has a test for subscript collision.

kcrisman commented 11 years ago
comment:13

Thanks for your comments - I think I agree with most of that.

You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:

sage: a = SR.symbols('a')
sage: bool(a[11] == a[1,1])
True

The next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?

Yeah, I'm not sure either. I'd appreciate input from people who actually use this stuff! I don't make such variable sequences in my own work. I see your new patch, but it still looks like

a_1_2_3_4_5_6 

could represent a whole Catalan stew of different indices with the nested thing going on. I don't feel like making this decision, but it would be a shame for something to languish. Maybe Burcin or Florent have ideas - it is true that it's silly to reinvent the wheel if Ginac has this already, but we do now have a patch...

Also, what do you know about the memory leak issue in comment:4?

  • Should SymbolSequence inherit from something?

I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?

Usually at least from SageObject, I guess. Would Symbol be appropriate, or maybe not? What generic methods would you want available to this class via inheritance that Symbol already has? I guess each individual symbol has everything that Symbol has.

orlitzky commented 11 years ago
comment:14

I'm not thinking very good right now -- can you give an example of e.g. a_1_2_3 being ambiguous?

kcrisman commented 11 years ago
comment:15

Is it a_1_{2_3} or a_{123}? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

orlitzky commented 11 years ago
comment:16

Replying to @kcrisman:

Is it a_1_{2_3} or a_{123}? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

It's a_{1)_{2}_{3}. To create a_{123}, you'd call a[123]. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,

sage: xs[3, 8:10, 2:4]
[xs_3_8_2, xs_3_8_3, xs_3_9_2, xs_3_9_3]

You're allowed to think of a[2,3] as a_{2_3}, but I don't think there's any way to create it distinct from a_{2}_{3}. The implementation creates an a_{2} first, and then subscripts that with 3.

I still think the underscores are a little ugly, but I've gotten used to them and it's preferable to having a[1,1] - a[11] == 0.

kcrisman commented 11 years ago
comment:17

Replying to @orlitzky:

Replying to @kcrisman:

Is it a_1_{2_3} or a_{123}? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

It's a_{1)_{2}_{3}. To create a_{123}, you'd call a[123]. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,

So how would you distinguish a_(1,2_3) from a_(1_2,3) - or is that not at issue? That's what I was getting at, I think.

orlitzky commented 11 years ago
comment:18

Replying to @kcrisman:

So how would you distinguish a_(1,2_3) from a_(1_2,3) - or is that not at issue? That's what I was getting at, I think.

Neither of those are directly constructible. It might be useful to make the bijection in your head, but a_{1}_{2}_{3} is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)

kcrisman commented 11 years ago
comment:19

Neither of those are directly constructible. It might be useful to make the bijection in your head, but a_{1}_{2}_{3} is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)

Okay, so there is only a_(1,2,3) and a_(12,3) and a_(123) and friends, everything at one level.

I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

orlitzky commented 11 years ago
comment:20

Replying to @kcrisman:

I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate SR.Symbols() and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

kcrisman commented 11 years ago
comment:21

eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate SR.Symbols()

? isn't this the new thing you just added? (well, symbols, but I don't see Symbols or Symbol - or did you mean symbol?)

and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

comment:4

burcin commented 11 years ago
comment:22

Attachment: trac_11576-indexed_expression.20130107.patch.gz

Replying to @nbruin:

Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of

sage: for i in range(10^8):
....:    l=1+SR.symbol("a%s"%str(i))

Perhaps GiNaCs indexed variables are less prone to leaking?

I'm not sure if anything is _leaking_ here. Pynac stores a symbol lookup table, in order to give you the same variable if you ask for SR.symbol('x') twice. As you pass new strings to this function, the lookup table grows. Note that if you repeat the same loop, memory usage does not grow.

I agree that this is a good argument for using indexed variables from Pynac/GiNaC instead of trying to invent our own. Here are some numbers:

sage: get_memory_usage()
866.94921875
sage: x.ind[5]
x.5
sage: for i in range(10000):
....:     t = x.ind[i]
....:     
sage: get_memory_usage()
867.2265625
sage: for i in range(10000):                  
....:     t = SR.symbol('a_'+str(i))
....:     
sage: get_memory_usage()                                    
870.4140625

This is with attachment: trac_11576-indexed_expression.20130107.patch. I just rebased the patch I was working on at the Sage days in Cernay.

Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.

kcrisman commented 11 years ago
comment:23

I just rebased the patch I was working on at the Sage days in Cernay.

With Sage 5.6.beta2:

patching file sage/libs/ginac.pxd
Hunk #2 FAILED at 101

The two lines with dbgprint are gone in beta2.

Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.

We could just disallow anything that didn't fit a number or regular expression with commas or something.

This patch is somewhat different also in that

raise NotImplementedError("don't know what to do with multiple indices yet!") 

and I have to say I do like being able to use slice notation directly on the variable.

Also,

sage: x.ind[2:6]
x.2

seems counterintuitive for a couple reasons - the dot suggests an attribute to me in Python, and where are the other indices? I'm also not sure I like allowing indexing of non-variables, but that might be ignorance and fear speaking. I just don't know what

sage: ex.ind[p]
(x + 1).[2, 1, 3]

really means.

orlitzky commented 11 years ago
comment:24

Replying to @kcrisman:

eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate SR.Symbols()

? isn't this the new thing you just added? (well, symbols, but I don't see Symbols or Symbol - or did you mean symbol?)

Typo, I meant lowercase SR.symbols(). Yeah, it's the thing I just added. But if there's ever a simpler solution with the same functionality, we could deprecate it. If I can just call x[1, 2:3] directly, then there's no need to do x = SR.symbols(); x[1, 2:3].

and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

comment:4

Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.

I can see the value in being able to use arbitrary subscripts, but I also like being able to do the common case quickly and easily. x.ind[1] is a little weird, especially if we can't use multiple indices. I also don't think it makes much sense to subscript SR(1).ind[5] and have it display 1.5.

nbruin commented 11 years ago
comment:25

Replying to @orlitzky:

Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.

Indeed. The issue is that once you can create sequences of symbols easily, people might start doing that more and hence memory footprint might become more of an issue.

When creating arbitrary symbols, one cannot really avoid adding an entry in a table somewhere. In principle, weak caching strategies should allow reclaiming that at some point, but coordination across various interfaces and libraries (specifically maxima) will make it very hard to estimate the lifetime of a symbol properly. Hence, a permanent memory cost for the creation of a new symbol is so hard to avoid that we should probably consider it unavoidable.

It would be nice if the permanent cost of a symbol sequence is only for the sequence, not for every member generated in it. Note that

for i in [1..10^8]:
    print sin(i)

does not explode in memory, because the entry sin in the symbol table gets combined with an argument i that varies. No further permanent entries are made. I bet that Ginac does something similar for its sequences of symbols (i.e., it probably stores a base symbol together with an index. Essentially a function call, but labelled in such a way that it gets handles as an atomic, free, entity. Implementation is really easy in theory: any simplification/rewrite routine should simply not descend further into the tree)

Ideally, one would have to find a way of encoding such symbols for maxima and maxima_lib as well, to avoid the (permanent) translation tables there to blow up as well. I don't know how easy it is to create structures capable of storing a serial number AND allowing differentiating etc. against it. Perhaps just encoding as an (uninterned) symbol might work. If you encode the string heavily enough you may be able to recognize them on the way out. For maxima_lib you could also store a flag as an attribute on the symbol and test for that in conversion back to sage (maxima uses something like this for "dummy" variables somewhere).

However, if you can't figure out how to do this for maxima, at least you may be able to avoid memory blow-up as long as these symbols don't touch various interfaces.

I think that if you think this is worth doing, it's worth doing it well. I think sage has now matured to the point where putting in features via cheap hacks does more harm than good. (In fact the cheap hacks that were necessary to get the project up and running originally are now hurting further development.)

vbraun commented 11 years ago

Attachment: trac_11576-indexed_expression.patch.gz

Rebased patch

vbraun commented 11 years ago
comment:27

Rebased on top of Sage-5.10.rc2

vbraun commented 11 years ago

Description changed:

--- 
+++ 
@@ -19,3 +19,5 @@
 imagination run wild and even do things like return full symbolic 
 matrices or vectors with slices: a[0:5, 0:5]. 

+ +Apply attachment: trac_11576-indexed_expression.patch

vbraun commented 11 years ago
comment:28

Doctest failures:

sage -t sage/symbolic/getitem.pyx  # 4 doctests failed
sage -t sage/symbolic/expression.pyx  # 1 doctest failed
slel commented 7 years ago

Description changed:

--- 
+++ 
@@ -1,23 +1,26 @@
 People are *always* asking how to get sequences of variables, like `a1,a2,a3,a4` or the like.  See [this ask.sagemath.org](http://ask.sagemath.org/question/611/implicitly-defining-a-sequence-of-variables) question, for example.

-Jason Grout has an interesting possible solution that should find a home somewhere in Sage. 
+Jason Grout has an interesting possible solution that should find a home somewhere in Sage
+(see [sage-support discussion](https://groups.google.com/d/topic/sage-support/GFJdjFvKCvo/discussion)).

-```
-
-class VariableGenerator(object): 
-     def __init__(self, prefix): 
-         self.__prefix = prefix 
-     @cached_method 
-     def __getitem__(self, key): 
-         return SR.var("%s%s"%(self.__prefix,key)) 
-Now just specify a prefix, and then you can index to your heart's 
-content to generate variables. 
-a=VariableGenerator('a') # some people may like 'a_' as the prefix 
-a[0], a[1], a[2] # all variables 
-Of course, this can easily be extended to using function call syntax: 
-a(0), or to using multiple indices: a[1,3].  Indeed, you can let your 
-imagination run wild and even do things like return full symbolic 
-matrices or vectors with slices: a[0:5, 0:5]. 
-```
+> ```
+> class VariableGenerator(object):
+>      def __init__(self, prefix):
+>          self.__prefix = prefix
+>      @cached_method
+>      def __getitem__(self, key):
+>          return SR.var("%s%s"%(self.__prefix,key))
+> ```
+> Now just specify a prefix, and then you can index to your heart's 
+> content to generate variables.
+> 
+> ```
+> a = VariableGenerator('a') # some people may like 'a_' as the prefix
+> a[0], a[1], a[2] # all variables
+> ```
+> Of course, this can easily be extended to using function call syntax:
+> a(0), or to using multiple indices: a[1,3].  Indeed, you can let your
+> imagination run wild and even do things like return full symbolic
+> matrices or vectors with slices: a[0:5, 0:5].

 Apply [attachment: trac_11576-indexed_expression.patch](https://github.com/sagemath/sage-prod/files/10653285/trac_11576-indexed_expression.patch.gz)
kcrisman commented 7 years ago
comment:34

Conceivably related: #22809

c22b6800-ec0b-4cbf-94c4-0a74aecc2093 commented 7 years ago
comment:35

please see also the recent: #22813. the motivation is the 1st sentence of this ticket's description,

People are always asking how to get sequences of variables, like a1,a2,a3,a4 or the like.

It has been suggested to not overwrite global variables as var('a0 a1 a2') would normally do, but instead return the tuple a without injection.

certainly less comprehensive than this ticket, but related. if i can partner with any of you, i'm willing to (try to) implement the updates needed for this one / to fix doctests.