Open 633146c6-d9c3-4bcd-9486-77c47b650840 opened 9 years ago
Description changed:
---
+++
@@ -1 +1 @@
-As part of general cleanup in #17983, implement new abstract classes for ``(Skew)(Semi)(Standard)Tableau(x)``.
+As part of general cleanup in #17983, implement new abstract classes for `(Skew)(Semi)(Standard)Tableau(x)`.
Commit: f4888c8
Branch: public/18013
New commits:
f4888c8 | First steps |
Branch pushed to git repo; I updated commit sha1. New commits:
0f22277 | Merge remote-tracking branch 'origin/develop' into AbstractTableaux |
c16bfe6 | Python 3 tweaks, more work |
5d95436 | Further tweaks; moved parent classes to their own file |
6b43530 | First stab at sorting out input validation vs. initialization between parents and elements |
FWIW, the current version is quite rough and continues to be basically untested. When some parent classes are in a decent state, testing and lots of bug fixes will need to be done. I plan on continuing to work on this tableaux meta project off and on over the next few months.
Author: Josh Swanson
Branch pushed to git repo; I updated commit sha1. New commits:
0840b6c | (forgot to save last few lines) |
Branch pushed to git repo; I updated commit sha1. New commits:
46efb65 | some further steps |
Changed keywords from days64, sage-combinat, tableaux, refactoring to days64, sage-combinat, tableaux, refactoring, days64.25
I'm continuing work on this at days64.25. bb5c9f3 was included since I needed the fix to get Sage to compile on my machine.
The general outline of this change is clearer in the most recent commit. The overall abstraction and design decisions (file names, division of labor between Parents and Elements, etc.) could probably use a second opinion at this point.
Thank you! I'm particularly glad you got rid of the six.with_metaclass
construct; it never worked for me.
I made a separate ticket about the metaclass issue, #18503. It's probably some issue with the ClasscallMetaclass
implementation, but I don't want to get distracted by it now. I'll try getting a decent subset of the skew tableaux code moved over today so some testing can be done. My current plan is to focus on tableau.py and skew_tableau.py, which are the oldest and most in need of refactoring anyway.
Branch pushed to git repo; I updated commit sha1. New commits:
4b48f64 | First pass at migrating existing skew_tableau.py to new format, lots of fixing needed |
days64.25 is essentially done, so I'm not sure when I'll get to work on this ticket again--hopefully soon. In any case, stuff that still needs to be done:
from_expr
and friends into underscore methods which bypass the coercion framework and optionally bypass validation.__classcall_private__
into dispatcher methods.SkewTableaux
location.I also made two changes that might be controversial since they affect how user's code works.
SkewTableau
.restrict
and to_chain
) from SkewTableau
to the newly created SemistandardSkewTableau
.My hope is that people are mostly constructing these tableaux by iterating over, say, StandardSkewTableaux(...)
, in which case these are not an issue.
Finally, I think this ticket should be finished when the skew tableaux code has been migrated over and polished. Doing the same to the (straight) tableaux code will be another large project, given that there's around twice as many lines.
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
3727968 | General docstring cleanup, in particular finished skew_tableau.py |
3c1d800 | Fixed abstract to_word_by_column |
80161d3 | More bad/abstract docstring tweaks |
9d68deb | Docstring tweaks, fixed iterator bugs |
229a750 | Added many _element_constructor_'s, docstring cleanup |
6b589ff | Broke out tableaux_options.py; minor fixups |
a1a2192 | Fixed doctest failures, removed old unused files |
cd2a2af | coverage testing fixups |
272a8c1 | html doc tweaks |
fff8dbf | Integrated new skew tableau into ribbons; added deprecation warnings; more tweaks; all sage doctests pass |
Changed keywords from days64, sage-combinat, tableaux, refactoring, days64.25 to days64, sage-combinat, tableaux, refactoring, days64.25, days68
I worked on this quite a bit more at days68 and it's now in a usable state, ready for others' comments and improvements. Some optimization should probably still be done, especially for the iterators. I'd also like to move tableaux stuff into their own namespace in the interpreter namespace, like graphs currently doest, but that's for another ticket.
The next targets for tableaux cleanup should probably be ribbon tableaux (short but horrible) and straight shape tableaux (less short, less horrible).
Oh, a couple of potentially controversial things I did:
Removed __classcall_private__
(and the corresponding metaclass) in favor of factory functions for both parents and elements. The main benefit of the change is that people no longer have to have any idea about (InheritComparison)ClasscallMetaclass
, __classcall_private__
, etc. to work with this code, and it was just simpler; Element classes also don't have to know about their Parents in any way, so the abstraction seems cleaner. The main drawback is that import statements need to be updated depending on if they want the class or the factory. The vast majority of uses want the factory functions, which is what the deprecation warnings in the old skew_tableau.py
point to. Syntax in the interpreter is unchanged.
Moved several methods from SkewTableau
to SemistandardSkewTableau
or StandardSkewTabeau
, namely bender_knuth_involution
, rectify
, slide
, and to_permutation
. Perhaps deprecation warnings should be added.
Thank you for your work on this; this is something that needs to be done (and Darij will be very happy once it is). However there are currently many things that need to be addressed before this gets into Sage.
Do not remove the __classcall_private__
for the parent; we want to tightly couple that code with that class.
For the elements, they do need to know about their parents from a code point of view, so it breaks some things that need to be done wrt the categories. I also don't see the use of having a separate function which we have to maintain as the creation of a tableau should be tightly coupled with the tableau class.
In short, IMO the __classcall_private__
(and __classcall__
) idioms are very useful for allowing easy public interface with the classes and couples the (user) creation with the class. I strongly recommend keeping/using them.
Next, from a quick look-over of the changes:
_element_constructor_
's in ribbon_shaped_tableau.py
).list_method
, it is not used in list(T)
, but instead T.list()
, which should be given by the category of EnumeratedSets
. If something isn't working without that, then you (we) need to do some debugging.__iter__
, you should not do self(...)
, but instead self.element_class(self, ...)
as that way you don't invoke the coercion framework (which should have some speed impact)._element_constructor_
; make each one have it's own and you can add a SEEALSO::
block to link it if you feel it is necessary._coerce_map_from_
is completely wrong (and poorly coded). Kill with fire and holy water. It should not be a @classmethod
. You're not doing any arithmetic on tableaux, so I'm guessing you want coercions for equality checks? Do a proper _coerce_map_from_(self, P)
and check isinstance(P, Foo)
. For reference, then Foo_with_category
is a subclass of Foo
.AbstractTableau
correctly. You don't need an __init__
method because you want to inherit the one from Element
. I think you should remove the dependency on six
in this class and make more pythonic iterations. I'm also not completely convinced of how you're not taking a dict
as input, but that can come later after the other issues are worked out._set_parent
call, ever. This is a strong indication you're doing something wrong.There will likely be more changes that need to be done, but this will be a good start.
Thanks for the feedback Travis!
About __classcall_private__
: I had asked Andrew Ohana for his opinion about this at Sage Days, and he strongly agreed with removing it; Darij also expressed happiness that I had removed it in an email; and I personally think it obfuscates the logic without any real benefit. So, we seem to fundamentally disagree here--not sure what else to say. Perhaps you could elaborate on why "we want to tightly couple that code with that class", and how you think my current layout does not achieve that goal. The __classcall_private__
idiom caused quite a bit of confusion at days64 when we first started this project, and it also fosters confusion over where to put validation and normalization code--in an element's __classcall_private__
, __init__
, or maybe somewhere on the parent?
About elements knowing their parents: I meant they need to know they have a parent (i.e. el.parent()
), but they don't need to know what their parent is. More precisely, __classcall_private__
is the only place the name of the parent is required in the element classes, which strikes me as a red flag. To be clear, I'm far more interested in practical issues like those above than in philosophical musings about OOP.
About multiple files: having worked with this stuff a fair amount, breaking up elements and parents is very useful if only because it's much easier to find things. Also, there's lots of precedent for this in Sage: Parent and Element themselves are in two files; rational_field.py
and rational.pyx
are broken up, as are many other rings. About the options file: I again disagree; it's awkward to have those options at the top of the file since they serve a fundamentally different purpose compared to the rest of the file. Doesn't seem like a separate file is really hurting anything.
About __iter__
and self.element_class(...)
instead of self(...)
: I agree optimization needs to be done, but the change you suggested won't typically work since input normalization is done by the parent class, and skipping it leads to subtle bugs. Figuring out the right optimizations is one of the main things that still needs to be done IMO. I totally agree that the current method is quite bloated.
About _coerce_map_from_
: hah, I dislike it too, but what you suggested won't work (it's what I initially did). The trouble is that hardcoding the class name (SkewTableau
for me) causes all child classes to have coercions between each other. For instance, SemistandardSkewTableau
would have a coercion to StandardSkewTableau
since they're both SkewTableau
. The horrible hack at the start was what Andrew came up with to get around the apparent inability to get at the non-dynamically-generated class. This seems like an issue with the category framework to me, but maybe there's a good solution.
About AbstractTableau.__init__
: ah whoops, thanks, I've at least changed it to super(AbstractTableau, self).__init__(*args, **kwds)
. I'd like to just delete the method, except for the comment up top explaining that normalization and validation should be done in the parent class. Given the amount of confusion over where to do that in the existing codebase, it seems worthwhile, but I'm on the fence--opinions?
About six
: I don't know what you mean. The use of six
here is very slightly different from the 2.x syntax and is only included with an eye towards 2->3 conversion. For instance, dict.iteritems
doesn't exist in Python 3.
Things I'll do now: add doctests to _element_constructor_
's in ribbon_shaped_tableau.py
, add deprecations to move the functions I mentioned, fix up the _element_constructor_
documentation, move _dct
from BadShapeTableau
to AbstractTableau
, remove the silly _set_parent
call, and remove list_method
. Those changes shouldn't take long.
Branch pushed to git repo; I updated commit sha1. New commits:
37ace34 | Implemented Travis' suggestions; all combinat doctests pass |
Lingering things from my perspective:
__classcall_private__
vs factories debate_coerce_map_from_
hack if possible_richcmp_
(probably just use #19108)though there are probably other things that need tweaking. I've got other things to do for a few days, but will try to do iteration optimization within the next week.
These "philosophical musings of OOP" that you are disregarding are there to for good programming practices and to help ease the burden of code maintenance. So they are real practical issues later on that will make our lives easier when we need to change things yet again. So please don't trivialize them. (I'm trying not to sound like a rant here, but I fear I've failed.)
With that being said, in an ideal world, we would just allow the user directly to the __init__
, which could take a variety of different inputs and initialize the class correctly as necessary. Python does not have polymorphism, and this at least allows us to work around that for the user entry point of *Tableau
and still provide a good construction data from the parent. The __classcall_private__
is also there so you don't have a separate function with a different name (or at least in a separate location which can lead to confusion as well). Even still, you are going to either have to parse that same input in that separate function.
This is an implementation detail that, yes, does raise the barrier of entry (which better documentation can alleviate), but the ease of code maintenance because you know it's there with the class and behaves like a constructor far outweighs the cost IMO. (BTW, if we ultimately decide to have the creation functions, they should have python_function_names
and not LookLikeClasses
.) The biggest problem is in a separate file you want to create a, e.g., SkewTableau
and want to check if something is an instance of a SkewTableau
, you have 2 different things for what really should be 1 (although I would advocate creating the correct parent and feeding the data to that parent, but I have seen this pattern a lot in code in the wild). This is the "tightly coupling" I was talking about and is the biggest way how your pattern breaks it.
The red flag that's popping up is more of a mathematical one in that we are trying to provide an object for when the user does not care about the set that object lies in. This idiom has made my life a lot easier in both using the code and maintaining other combinatorics code that uses it, and I strongly recommend keeping it.
In regards to the checking/duplication, what you should do is think about how you want to code to flow through, what needs to be parsed, and at which point. I'm not necessarily sure I like how you do your containment checking because you are creating an element which you immediately discard. This makes it a lot slower than it needs to be and you can instead implement the necessary checks in various __contains__
methods (or common shortcuts in there, and have that call a _contain_check
method which you override in subclasses), and then have the _element_constructor_
just create an element and then call the __contains__
. I do understand why you did things this way, but there is a price that is and will have to be paid.
Normalization should not be done in the parent; that is why you are doing so much work. Let the element in the __init__
do the normalization, put it in a @staticmethod
of the element and call that when needed, or put it in one place that all code paths for element creation from the user must go through. I think this will greatly simplify your code.
One other thing with the _element_constructor_
, everywhere you basically are checking if check
is in the keywords. Please make check
an explicit named argument.
For splitting the parent/element classes up, a lot of those rings, look at the file extensions. We want fast arithmetic and the like, which are done on the elements, and so we use cython. However, cython has limitations and don't (or at least didn't when those rings were created) behave well with respect to categories and other pythonic things, so they were written in python. (Again, most code in the rings folder is quite old.) Separate files usually means separate concerns, but I feel their topic is close enough together that they should be in the same file. Plus it helps keep down the possibility of import loops (a major maintenance headache) and startup imports/time.
The self(x)
causes Sage to set the coercions for self
from the parent of x
, which has led to some very subtle bugs (e.g., #15309). The speed bonus comes for free. Do not use unless you have a good reason to, and in this case, I don't see a good one. As for the __iter__
, this I will insist must be changed before this gets into Sage, there are too many subtleties and speed to be gained.
For the _coerce_map_from_
, you will need to implement multiple such methods in the appropriate parents or check more data if you want finer control. This is a horrible hack and completely breaks the assumptions of the coercion framework. It really smells more like you want simple conversions with a good __eq__
implementation, which will be faster at the tradeoff of a little more code/work.
The answer to confusion in code is more/better documentation/comments.
For getting around six
, you do things like [d[i] for i in d]
or just put items
and deal with the slowdown until we (finally) convert to python3. However I feel that your current design should implement a custom __getitem__
for the abstract tableaux and force all subclasses to use that same internal data structure (that is part of the reason why they are subclasses). If you want different data structures but to inherit methods, python is nice with multiple inheritance. Create separate ABC's for each internal type with the necessary methods for the actual concrete classes you'd want. Otherwise you're going against well-established "philosophical musings of OOP" and have to work harder than you need to for the things you want.
Sorry this is more of a treatise than useful comments, but I think there is a lot of work that will need to be done on this before this can be merged into Sage. I understand that other things take precedence over this, but I appreciate your work on this.
Thanks again for your thoughts!
I don't think I quite communicated my intent about "philosophical musings". You say they're "to help ease the burden of code maintenance", which I'd say is a perfectly reasonable "practical issue"--it's one of the most important when writing code. I just find an argument like "X is a red flag" or "Y should be tied closely to Z" to be generally much less convincing than "X is hard to maintain" or "Y is hard to find if it's not near Z". I'd call the former "philosophical musings" and the latter "practical issues".
I'll think further on your comments about __classcall_private__
. Ease of code maintenance is one of the biggest reasons I removed it, but nonetheless I'll weigh the options again. I know it's a common idiom in combinat, though grepping for it shows that it doesn't seem to have caught on much elsewhere in the Sage codebase. I would imagine a number of our disagreements stem from combinat doing something one way and me not being familiar with/not wanting to use that way, which isn't necessarily a bad thing, especially when it's not a clear-cut case of "house style".
I'll also rethink where validation and normalization "should" or "should be expected" to be done. I don't see much potential for "greatly simplified code"--perhaps you could be more specific about what simplifications you envision. If you mean that less work could be done on element creation in specific instances like making an SSYT from an SYT, then sure.
Again we agree that iteration optimization is essentially required. It's probably the single most important use of the SSYT and SYT code, and there's tons of nested, pointless resolutions, normalizations, and copies going on currently.
At the end you rather suddenly suggested a significant refactoring, where as far as I can tell you want SkewTableau
to inherit from a second base class which is built on the _st
representation of tableaux, so SkewTableau
would use both _st
and _dct
. The specifics and benefits are unclear to me--I'd have to reimplement many of the routines currently on AbstractTableau
in the new context using the _st
representation, whereas the whole point of this refactoring was to cut down on code duplication. (Or, perhaps you wanted the _st
ABC to be very minimalistic? If so, what's the point? Do you expect both ABC's to have to_word
, for instance, and if so why?) I tend to think of the _st
representation used in _dct
as an optimization since it's how existing routines and classes want to deal with or generate these tableaux, rather than as a fundamentally different design choice, which makes me much more okay with the current abstraction. At a minimum, this idea should be documented somewhere though!
Oh, about the _coerce_map_from_
hack: I can certainly reimplement _coerce_map_from_
on each child class and hard-code the name of the subclass there. However, this seems to be bad OOP practice--I explicitly do not want child classes to inherit that method, since it'll introduce a subtle bug that we both missed at first glance. What I want, and what seems reasonable, is either for the dynamically generated class to preserve isinstance
or to get at the non-dynamically generated class, either of which seems to be an issue with the category framework. But I'm not at all familiar with the category framework, so I'll defer to experts. Could you expand on how the current hack "completely breaks the assumptions of the coercion framework"?
Thanks again for your input!
The reason it doesn't appear elsewhere is that either it doesn't use categories/parent/elements (e.g. graphs) or creation of elements are done only through the parent. There are only a few other examples where I know of outside of combinat
that uses parents/elements that uses a function as a way of creating elements not through the parent. They are integer
and rational
, and they are low-level, optimized, and cython (which until very recently did not support metaclasses). So it ends up being more of a "house style" because of what our users expect, but it is a slight abuse IMO (how many papers say, let X
be the set of all partitions, and let \lambda \in X
?).
For the normalization, I see many places in the code where you seem to be fighting against doing normalization/validation, most notably: it was part of your reasoning for the factory functions IIRC, the check
argument checks everywhere, the method which iterates over the dict
of cells and having that be an abstract method. The last one is the reason I suggested the refactoring and the ABCs (which can be very complicated, but only require one or two methods to be implemented in the concrete class; depends on the situation) would be more minimal. I'm starting to read deeper into the code and think about the structuring now. (I saw a talk at last year's ~PyCon which gave a good argument why we should work harder now rather than take the "technical debt" that we'd have to pay later.)
I'm sorry, I must have misunderstood what you said about the iteration.
For the _coerce_map_from_
, if you do not want the child classes to inherit it, then they should not be child classes. If Y
is a subclass of X
, then instances of Y
should behave like instances of X
(Nicolas has referred to this as the 'is a' test). I'm okay with an ABC and/or mix-in class not conforming to this, but not for user accessible concrete classes.
I will take a full detailed look at this next week and do my reviewer pass over the code (which might involve the refactoring I mentioned above). Thank you again for your work on this.
(Oh look a short-ish response by me :P)
Reviewer: Travis Scrimshaw
Nicolas, perhaps you could also add your wisdom to this discussion too?
As part of general cleanup in #17983, implement new abstract classes for
(Skew)(Semi)(Standard)Tableau(x)
.CC: @sagetrac-jkeitel @darijgr @MariaMonks @opechenik @nthiery
Component: combinatorics
Keywords: days64, sage-combinat, tableaux, refactoring, days64.25, days68
Author: Josh Swanson
Branch/Commit: public/18013 @
37ace34
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/18013