sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.36k stars 462 forks source link

Implement abstract tableau refactoring #18013

Open 633146c6-d9c3-4bcd-9486-77c47b650840 opened 9 years ago

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

As part of general cleanup in #17983, implement new abstract classes for (Skew)(Semi)(Standard)Tableau(x).

CC: @sagetrac-jkeitel @darijgr @MariaMonks @opechenik @nthiery

Component: combinatorics

Keywords: days64, sage-combinat, tableaux, refactoring, days64.25, days68

Author: Josh Swanson

Branch/Commit: public/18013 @ 37ace34

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/18013

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Description changed:

--- 
+++ 
@@ -1 +1 @@
-As part of general cleanup in #17983, implement new abstract classes for ``(Skew)(Semi)(Standard)Tableau(x)``.
+As part of general cleanup in #17983, implement new abstract classes for `(Skew)(Semi)(Standard)Tableau(x)`.
633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Commit: f4888c8

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Branch: public/18013

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

New commits:

f4888c8First steps
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0283f6eInitial commit
29a811dMore work on rows and cols
6f28a68Readded abstract_tableau.py
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from f4888c8 to 6f28a68

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 6f28a68 to 77ac126

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

3f3734bConstructor for SkewTableau
16b4e16Let elements know about their parents
77ac126More work on parents
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

0f22277Merge remote-tracking branch 'origin/develop' into AbstractTableaux
c16bfe6Python 3 tweaks, more work
5d95436Further tweaks; moved parent classes to their own file
6b43530First stab at sorting out input validation vs. initialization between parents and elements
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 77ac126 to 6b43530

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:6

FWIW, the current version is quite rough and continues to be basically untested. When some parent classes are in a decent state, testing and lots of bug fixes will need to be done. I plan on continuing to work on this tableaux meta project off and on over the next few months.

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Author: Josh Swanson

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 6b43530 to 0840b6c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

0840b6c(forgot to save last few lines)
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 0840b6c to 46efb65

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

46efb65some further steps
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 46efb65 to cb3e4db

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

bb5c9f3Do not let R use global or user-local Makevars
1b3bf13Merge branch 'r_makevars' into AbstractTableaux
cb3e4dbBroke things into smaller files, cleaned up BadShapeTableau(x)
633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Changed keywords from days64, sage-combinat, tableaux, refactoring to days64, sage-combinat, tableaux, refactoring, days64.25

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:10

I'm continuing work on this at days64.25. bb5c9f3 was included since I needed the fix to get Sage to compile on my machine.

The general outline of this change is clearer in the most recent commit. The overall abstraction and design decisions (file names, division of labor between Parents and Elements, etc.) could probably use a second opinion at this point.

darijgr commented 9 years ago
comment:11

Thank you! I'm particularly glad you got rid of the six.with_metaclass construct; it never worked for me.

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:12

I made a separate ticket about the metaclass issue, #18503. It's probably some issue with the ClasscallMetaclass implementation, but I don't want to get distracted by it now. I'll try getting a decent subset of the skew tableaux code moved over today so some testing can be done. My current plan is to focus on tableau.py and skew_tableau.py, which are the oldest and most in need of refactoring anyway.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

22a8b8cFixed `__classcall__` stuff; made it compile
8085221Updating comments
82cf374More work on SkewTableau; testing tomorrow?
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from cb3e4db to 82cf374

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 82cf374 to 4b48f64

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

4b48f64First pass at migrating existing skew_tableau.py to new format, lots of fixing needed
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 4b48f64 to d88663e

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e8194e9new skew_tableau.py doctests now all pass
d88663eFixed most new skew_tableaux.py doctests
633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:16

days64.25 is essentially done, so I'm not sure when I'll get to work on this ticket again--hopefully soon. In any case, stuff that still needs to be done:

I also made two changes that might be controversial since they affect how user's code works.

  1. I added (optional; default: enabled) validation to SkewTableau.
  2. I moved some methods (e.g. restrict and to_chain) from SkewTableau to the newly created SemistandardSkewTableau.

My hope is that people are mostly constructing these tableaux by iterating over, say, StandardSkewTableaux(...), in which case these are not an issue.

Finally, I think this ticket should be finished when the skew tableaux code has been migrated over and polished. Doing the same to the (straight) tableaux code will be another large project, given that there's around twice as many lines.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

3727968General docstring cleanup, in particular finished skew_tableau.py
3c1d800Fixed abstract to_word_by_column
80161d3More bad/abstract docstring tweaks
9d68debDocstring tweaks, fixed iterator bugs
229a750Added many _element_constructor_'s, docstring cleanup
6b589ffBroke out tableaux_options.py; minor fixups
a1a2192Fixed doctest failures, removed old unused files
cd2a2afcoverage testing fixups
272a8c1html doc tweaks
fff8dbfIntegrated new skew tableau into ribbons; added deprecation warnings; more tweaks; all sage doctests pass
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d88663e to fff8dbf

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Changed keywords from days64, sage-combinat, tableaux, refactoring, days64.25 to days64, sage-combinat, tableaux, refactoring, days64.25, days68

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:18

I worked on this quite a bit more at days68 and it's now in a usable state, ready for others' comments and improvements. Some optimization should probably still be done, especially for the iterators. I'd also like to move tableaux stuff into their own namespace in the interpreter namespace, like graphs currently doest, but that's for another ticket.

The next targets for tableaux cleanup should probably be ribbon tableaux (short but horrible) and straight shape tableaux (less short, less horrible).

Oh, a couple of potentially controversial things I did:

tscrim commented 9 years ago
comment:19

Thank you for your work on this; this is something that needs to be done (and Darij will be very happy once it is). However there are currently many things that need to be addressed before this gets into Sage.

Do not remove the __classcall_private__ for the parent; we want to tightly couple that code with that class.

For the elements, they do need to know about their parents from a code point of view, so it breaks some things that need to be done wrt the categories. I also don't see the use of having a separate function which we have to maintain as the creation of a tableau should be tightly coupled with the tableau class.

In short, IMO the __classcall_private__ (and __classcall__) idioms are very useful for allowing easy public interface with the classes and couples the (user) creation with the class. I strongly recommend keeping/using them.

Next, from a quick look-over of the changes:

There will likely be more changes that need to be done, but this will be a good start.

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:20

Thanks for the feedback Travis!

Things I'll do now: add doctests to _element_constructor_'s in ribbon_shaped_tableau.py, add deprecations to move the functions I mentioned, fix up the _element_constructor_ documentation, move _dct from BadShapeTableau to AbstractTableau, remove the silly _set_parent call, and remove list_method. Those changes shouldn't take long.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

37ace34Implemented Travis' suggestions; all combinat doctests pass
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from fff8dbf to 37ace34

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:22

Lingering things from my perspective:

though there are probably other things that need tweaking. I've got other things to do for a few days, but will try to do iteration optimization within the next week.

tscrim commented 9 years ago
comment:23

These "philosophical musings of OOP" that you are disregarding are there to for good programming practices and to help ease the burden of code maintenance. So they are real practical issues later on that will make our lives easier when we need to change things yet again. So please don't trivialize them. (I'm trying not to sound like a rant here, but I fear I've failed.)

With that being said, in an ideal world, we would just allow the user directly to the __init__, which could take a variety of different inputs and initialize the class correctly as necessary. Python does not have polymorphism, and this at least allows us to work around that for the user entry point of *Tableau and still provide a good construction data from the parent. The __classcall_private__ is also there so you don't have a separate function with a different name (or at least in a separate location which can lead to confusion as well). Even still, you are going to either have to parse that same input in that separate function.

This is an implementation detail that, yes, does raise the barrier of entry (which better documentation can alleviate), but the ease of code maintenance because you know it's there with the class and behaves like a constructor far outweighs the cost IMO. (BTW, if we ultimately decide to have the creation functions, they should have python_function_names and not LookLikeClasses.) The biggest problem is in a separate file you want to create a, e.g., SkewTableau and want to check if something is an instance of a SkewTableau, you have 2 different things for what really should be 1 (although I would advocate creating the correct parent and feeding the data to that parent, but I have seen this pattern a lot in code in the wild). This is the "tightly coupling" I was talking about and is the biggest way how your pattern breaks it.

The red flag that's popping up is more of a mathematical one in that we are trying to provide an object for when the user does not care about the set that object lies in. This idiom has made my life a lot easier in both using the code and maintaining other combinatorics code that uses it, and I strongly recommend keeping it.

In regards to the checking/duplication, what you should do is think about how you want to code to flow through, what needs to be parsed, and at which point. I'm not necessarily sure I like how you do your containment checking because you are creating an element which you immediately discard. This makes it a lot slower than it needs to be and you can instead implement the necessary checks in various __contains__ methods (or common shortcuts in there, and have that call a _contain_check method which you override in subclasses), and then have the _element_constructor_ just create an element and then call the __contains__. I do understand why you did things this way, but there is a price that is and will have to be paid.

Normalization should not be done in the parent; that is why you are doing so much work. Let the element in the __init__ do the normalization, put it in a @staticmethod of the element and call that when needed, or put it in one place that all code paths for element creation from the user must go through. I think this will greatly simplify your code.

One other thing with the _element_constructor_, everywhere you basically are checking if check is in the keywords. Please make check an explicit named argument.

For splitting the parent/element classes up, a lot of those rings, look at the file extensions. We want fast arithmetic and the like, which are done on the elements, and so we use cython. However, cython has limitations and don't (or at least didn't when those rings were created) behave well with respect to categories and other pythonic things, so they were written in python. (Again, most code in the rings folder is quite old.) Separate files usually means separate concerns, but I feel their topic is close enough together that they should be in the same file. Plus it helps keep down the possibility of import loops (a major maintenance headache) and startup imports/time.

The self(x) causes Sage to set the coercions for self from the parent of x, which has led to some very subtle bugs (e.g., #15309). The speed bonus comes for free. Do not use unless you have a good reason to, and in this case, I don't see a good one. As for the __iter__, this I will insist must be changed before this gets into Sage, there are too many subtleties and speed to be gained.

For the _coerce_map_from_, you will need to implement multiple such methods in the appropriate parents or check more data if you want finer control. This is a horrible hack and completely breaks the assumptions of the coercion framework. It really smells more like you want simple conversions with a good __eq__ implementation, which will be faster at the tradeoff of a little more code/work.

The answer to confusion in code is more/better documentation/comments.

For getting around six, you do things like [d[i] for i in d] or just put items and deal with the slowdown until we (finally) convert to python3. However I feel that your current design should implement a custom __getitem__ for the abstract tableaux and force all subclasses to use that same internal data structure (that is part of the reason why they are subclasses). If you want different data structures but to inherit methods, python is nice with multiple inheritance. Create separate ABC's for each internal type with the necessary methods for the actual concrete classes you'd want. Otherwise you're going against well-established "philosophical musings of OOP" and have to work harder than you need to for the things you want.

Sorry this is more of a treatise than useful comments, but I think there is a lot of work that will need to be done on this before this can be merged into Sage. I understand that other things take precedence over this, but I appreciate your work on this.

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago
comment:24

Thanks again for your thoughts!

I don't think I quite communicated my intent about "philosophical musings". You say they're "to help ease the burden of code maintenance", which I'd say is a perfectly reasonable "practical issue"--it's one of the most important when writing code. I just find an argument like "X is a red flag" or "Y should be tied closely to Z" to be generally much less convincing than "X is hard to maintain" or "Y is hard to find if it's not near Z". I'd call the former "philosophical musings" and the latter "practical issues".

I'll think further on your comments about __classcall_private__. Ease of code maintenance is one of the biggest reasons I removed it, but nonetheless I'll weigh the options again. I know it's a common idiom in combinat, though grepping for it shows that it doesn't seem to have caught on much elsewhere in the Sage codebase. I would imagine a number of our disagreements stem from combinat doing something one way and me not being familiar with/not wanting to use that way, which isn't necessarily a bad thing, especially when it's not a clear-cut case of "house style".

I'll also rethink where validation and normalization "should" or "should be expected" to be done. I don't see much potential for "greatly simplified code"--perhaps you could be more specific about what simplifications you envision. If you mean that less work could be done on element creation in specific instances like making an SSYT from an SYT, then sure.

Again we agree that iteration optimization is essentially required. It's probably the single most important use of the SSYT and SYT code, and there's tons of nested, pointless resolutions, normalizations, and copies going on currently.

At the end you rather suddenly suggested a significant refactoring, where as far as I can tell you want SkewTableau to inherit from a second base class which is built on the _st representation of tableaux, so SkewTableau would use both _st and _dct. The specifics and benefits are unclear to me--I'd have to reimplement many of the routines currently on AbstractTableau in the new context using the _st representation, whereas the whole point of this refactoring was to cut down on code duplication. (Or, perhaps you wanted the _st ABC to be very minimalistic? If so, what's the point? Do you expect both ABC's to have to_word, for instance, and if so why?) I tend to think of the _st representation used in _dct as an optimization since it's how existing routines and classes want to deal with or generate these tableaux, rather than as a fundamentally different design choice, which makes me much more okay with the current abstraction. At a minimum, this idea should be documented somewhere though!

Oh, about the _coerce_map_from_ hack: I can certainly reimplement _coerce_map_from_ on each child class and hard-code the name of the subclass there. However, this seems to be bad OOP practice--I explicitly do not want child classes to inherit that method, since it'll introduce a subtle bug that we both missed at first glance. What I want, and what seems reasonable, is either for the dynamically generated class to preserve isinstance or to get at the non-dynamically generated class, either of which seems to be an issue with the category framework. But I'm not at all familiar with the category framework, so I'll defer to experts. Could you expand on how the current hack "completely breaks the assumptions of the coercion framework"?

Thanks again for your input!

tscrim commented 9 years ago
comment:25

The reason it doesn't appear elsewhere is that either it doesn't use categories/parent/elements (e.g. graphs) or creation of elements are done only through the parent. There are only a few other examples where I know of outside of combinat that uses parents/elements that uses a function as a way of creating elements not through the parent. They are integer and rational, and they are low-level, optimized, and cython (which until very recently did not support metaclasses). So it ends up being more of a "house style" because of what our users expect, but it is a slight abuse IMO (how many papers say, let X be the set of all partitions, and let \lambda \in X?).

For the normalization, I see many places in the code where you seem to be fighting against doing normalization/validation, most notably: it was part of your reasoning for the factory functions IIRC, the check argument checks everywhere, the method which iterates over the dict of cells and having that be an abstract method. The last one is the reason I suggested the refactoring and the ABCs (which can be very complicated, but only require one or two methods to be implemented in the concrete class; depends on the situation) would be more minimal. I'm starting to read deeper into the code and think about the structuring now. (I saw a talk at last year's ~PyCon which gave a good argument why we should work harder now rather than take the "technical debt" that we'd have to pay later.)

I'm sorry, I must have misunderstood what you said about the iteration.

For the _coerce_map_from_, if you do not want the child classes to inherit it, then they should not be child classes. If Y is a subclass of X, then instances of Y should behave like instances of X (Nicolas has referred to this as the 'is a' test). I'm okay with an ABC and/or mix-in class not conforming to this, but not for user accessible concrete classes.

I will take a full detailed look at this next week and do my reviewer pass over the code (which might involve the refactoring I mentioned above). Thank you again for your work on this.

(Oh look a short-ish response by me :P)

tscrim commented 9 years ago

Reviewer: Travis Scrimshaw

tscrim commented 9 years ago
comment:26

Nicolas, perhaps you could also add your wisdom to this discussion too?