sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.3k stars 447 forks source link

Improving word construction #7520

Closed seblabbe closed 14 years ago

seblabbe commented 14 years ago

Improve the creation of a word from a word when the parent changes :

BEFORE:

sage: w = Word('abab')
sage: P = WordPaths('abcd')
sage: P(w)
word: abab
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_str'>
sage: type(P(w))
<class 'sage.combinat.words.word.FiniteWord_str'>

AFTER:

sage: w = Word('abab')
sage: P = WordPaths('abcd')
sage: P(w)
Path: abab
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_str'>
sage: type(P(w))
<class 'sage.combinat.words.paths.FiniteWordPath_square_grid_str'>

The following construction gets also faster with the patch applied :

BEFORE:

sage: w = Word([0,1]*10000)
sage: %timeit z = Words([2,0,1])(w)
1000 loops, best of 3: 586 µs per loop

AFTER:

sage: w = Word([0,1]*10000)
sage: %timeit z = Words([2,0,1])(w)
1000 loops, best of 3: 343 µs per loop

CC: @saliola

Component: combinatorics

Author: Sébastien Labbé

Reviewer: Franco Saliola

Merged: sage-4.3.4.alpha0

Issue created by migration from https://trac.sagemath.org/ticket/7520

seblabbe commented 14 years ago
comment:1

Will post a patch soon...

seblabbe commented 14 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-The `_check` function of the Combinatorial class of all words (checking that the 40 first letters of the word are in the parent) is called for each word created by the user ....and by any other function. It would be good to add a check parameter (True or False) whether to do the checking. For example, for internal function, it could be turned off. Here is a example of what can be gained from this modification when generating all words of a given length :
+(1) The `_check` function of the Combinatorial class of all words (checking that the 40 first letters of the word are in the parent) is called for each word created by the user ....and by any other function. It would be good to add a check parameter (True or False) whether to do the checking. For example, for internal function, it could be turned off. Here is a example of what can be gained from this modification when generating all words of a given length :

 BEFORE:

@@ -19,7 +19,7 @@

-Creation of a word from a word when the parent changes doesn't work well : +(2) Creation of a word from a word when the parent changes doesn't work well :

 sage: w = Word('abab')
@@ -32,12 +32,3 @@
 <class 'sage.combinat.words.word.FiniteWord_str'>

-Creation of a word represented by list from a word represented as str doesn't work well and could work easily:

- -sage: w = Word('aababababab') -sage: Word(w, datatype='list') -word: aababababab -sage: type(_) -<class 'sage.combinat.words.word.FiniteWord_str'> -

seblabbe commented 14 years ago
comment:3

My patch needs work:

sage: P = WordPaths([0,1,2,3])
sage: P
Word Paths on the square grid
sage: w = words.ChristoffelWord(5,8)
sage: P(w)
Traceback (most recent call last):
...
AttributeError: 'LowerChristoffelWord' object has no attribute '_class_str'

I must admit my patch was not that clean and it now shows its limits. The problem concerns the creation of a word from a word especially when the parent changes. Should we try to use some sort of shorcut (like simply change some important attributes like __class__) or not? I have an idea of something better which doesn't use any shortcut. For example, if the input word is a FiniteWord_list, we could simply restore the list and pass through the usual steps for when the input is a list. I think this should be more safe. I just have to think for a solution what we do when the word is defined as an iterator. A new patch (with cleaner code) to be posted soon.

Ideas are welcome.

seblabbe commented 14 years ago
comment:4

In the new patch, I made the corrections to words.py which handles in a some better way construction of words from words. There was one single strange doctest failing in word.py involving word morphisms. So, I improved again the __call__ method of WordMorphism which was doing to much useless stuff.

seblabbe commented 14 years ago
comment:6

I am changing the status to "needs work" because I thought of a cleaner way of solving this.

I will fold the two actual patches, improve them and post a new patch soon...

seblabbe commented 14 years ago

Forget about this patch.

seblabbe commented 14 years ago

Description changed:

--- 
+++ 
@@ -19,7 +19,9 @@

-(2) Creation of a word from a word when the parent changes doesn't work well : +(2) Improvement in the creation of a word from a word when the parent changes : + +BEFORE:

 sage: w = Word('abab')
@@ -32,3 +34,52 @@
 <class 'sage.combinat.words.word.FiniteWord_str'>

+AFTER: + + +sage: w = Word('abab') +sage: P = WordPaths('abcd') +sage: P(w) +Path: abab +sage: type(w) +<class 'sage.combinat.words.word.FiniteWord_str'> +sage: type(P(w)) +<class 'sage.combinat.words.paths.FiniteWordPath_square_grid_str'> + + +The following construction gets also faster with the patch applied : + +BEFORE: + + +sage: w = Word([0,1]*10000) +sage: %timeit z = Words([2,0,1])(w) +1000 loops, best of 3: 586 µs per loop + + +AFTER: + + +sage: w = Word([0,1]*10000) +sage: %timeit z = Words([2,0,1])(w) +1000 loops, best of 3: 343 µs per loop + + + +(3) WordMorphism.__call__ is doing a conversion of the input into the domain which is not necessary. Basicly, all what is needed is that the input be iterable. Here are some timing improvements : + +BEFORE: + + +sage: m = WordMorphism('a->aab,b->ba') +sage: %timeit w = m('a'*100) +1000 loops, best of 3: 343 µs per loop + + +AFTER: + + +sage: m = WordMorphism('a->aab,b->ba') +sage: %timeit w = m('a'*100) +1000 loops, best of 3: 242 µs per loop +

seblabbe commented 14 years ago

Author: Sebastien Labbe

seblabbe commented 14 years ago
comment:7

Attachment: trac_7520_my_own_review-sl.patch.gz

seblabbe commented 14 years ago
comment:8

I am changing the status to "needs work" because I thought of a cleaner way of solving this.

I still think that the construction of words is not clean, but I will not clean it in this ticket. This ticket solves some problems and I believe it can now get a review and an eventual merge into sage.

Here are some key points I am actually thinking (I am writing them as a reference for future improvements):

I will fold the two actual patches, improve them and post a new patch soon...

So I did folded the two actual patches. I also updated the description of the patch to help the review. The patch now needs review!

seblabbe commented 14 years ago

Attachment: trac_7520_words_construction_improvements-sl.patch.gz

Apply only this one.

seblabbe commented 14 years ago

Description changed:

--- 
+++ 
@@ -1,25 +1,4 @@
-(1) The `_check` function of the Combinatorial class of all words (checking that the 40 first letters of the word are in the parent) is called for each word created by the user ....and by any other function. It would be good to add a check parameter (True or False) whether to do the checking. For example, for internal function, it could be turned off. Here is a example of what can be gained from this modification when generating all words of a given length :
-
-BEFORE:
-
-```
-sage: W = Words([0,1])
-sage: time l = list(W.iterate_by_length(15))
-CPU times: user 2.60 s, sys: 0.09 s, total: 2.69 s
-Wall time: 2.71 s
-```
-
-AFTER:
-
-```
-sage: W = Words([0,1])
-sage: time l = list(W.iterate_by_length(15))
-CPU times: user 1.99 s, sys: 0.06 s, total: 2.05 s
-Wall time: 2.08 s
-```
-
-
-(2) Improvement in the creation of a word from a word when the parent changes :
+Improve the creation of a word from a word when the parent changes :

 BEFORE:

@@ -65,21 +44,3 @@
 1000 loops, best of 3: 343 µs per loop

- -(3) WordMorphism.__call__ is doing a conversion of the input into the domain which is not necessary. Basicly, all what is needed is that the input be iterable. Here are some timing improvements :

-BEFORE:

- -sage: m = WordMorphism('a->aab,b->ba') -sage: %timeit w = m('a'*100) -1000 loops, best of 3: 343 µs per loop -

-AFTER:

- -sage: m = WordMorphism('a->aab,b->ba') -sage: %timeit w = m('a'*100) -1000 loops, best of 3: 242 µs per loop -

seblabbe commented 14 years ago
comment:9

I just separated this ticket into three parts. It will make things easier for the reviewer like that since the three parts were really independant. See #8287 and #8289 for the issues that used to be discussed here.

saliola commented 14 years ago
comment:10

Sebastien, great job with the patch. I'm really glad the efficiency of the code is improving!

The patch applies cleanly to sage-4.3.3 and all doctests pass. Positive review.

saliola commented 14 years ago

Reviewer: Franco Saliola

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago
comment:11

Merged trac_7520_words_construction_improvements-sl.patch.

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago

Changed author from Sebastien Labbe to Sébastien Labbé

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago

Merged: sage-4.3.4.alpha0