Faster creation of words by the parent

seblabbe commented 10 years ago

Move the code of _construct_word into __call__ method.
Create _word_from_list methods which are shortcuts for __call__ when we know the datatype.

Here are some benchmarks:

list:

sage: W = Words()
sage: L = range(1000)
sage: %timeit W.call__old(L)             # before
10000 loops, best of 3: 45.1 µs per loop

sage: %timeit W(L)                       # after
10000 loops, best of 3: 17.1 µs per loop  
sage: %timeit W(L, check=False)     
100000 loops, best of 3: 2.21 µs per loop 
sage: %timeit W._word_from_list(L)  
1000000 loops, best of 3: 1.33 µs per loop

str:

sage: W = Words()
sage: s = 'a'*1000
sage: %timeit W.call__old(s)              # before
10000 loops, best of 3: 45.7 µs per loop   
sage: %timeit W(s)                        # after
10000 loops, best of 3: 18.4 µs per loop  
sage: %timeit W(s, check=False)           
100000 loops, best of 3: 2.62 µs per loop
sage: %timeit W._word_from_str(s)         
1000000 loops, best of 3: 1.23 µs per loop

tuple:

sage: t = tuple(L)
sage: %timeit W.call__old(t)             # before
10000 loops, best of 3: 45.9 µs per loop 
sage: %timeit W(t)                       # after
10000 loops, best of 3: 18.6 µs per loop 
sage: %timeit W(t,check=False)           
100000 loops, best of 3: 3.25 µs per loop
sage: %timeit W._word_from_tuple(t)      
1000000 loops, best of 3: 1.3 µs per loop

char (here creation is longer than str, tuple or list because a copy of data is performed):

sage: W = Words([0,1,2])
sage: L = [0,1,1,2]*1000
sage: type(W(L))
<class 'sage.combinat.words.word.FiniteWord_char'>
sage: %timeit W.call__old(L)           # before
1000 loops, best of 3: 322 µs per loop
sage: %timeit W(L)                     # after
1000 loops, best of 3: 292 µs per loop
sage: %timeit W(L, check=False)       
1000 loops, best of 3: 248 µs per loop
sage: %timeit W._word_from_char(L)    
1000 loops, best of 3: 245 µs per loop

BEFORE:

sage: W = Words(range(5))
sage: %time L = list(W.iterate_by_length(7))
CPU times: user 4.6 s, sys: 54.7 ms, total: 4.66 s
Wall time: 4.68 s

AFTER (about 20 times faster):

sage: W = Words(range(5))
sage: %time L = list(W.iterate_by_length(7))
CPU times: user 235 ms, sys: 29.1 ms, total: 264 ms
Wall time: 259 ms

CC: @videlec @staroste

Component: combinatorics

Author: Sébastien Labbé

Branch/Commit: 7b5b64c

Reviewer: Vincent Delecroix

Issue created by migration from https://trac.sagemath.org/ticket/17021

seblabbe commented 10 years ago

New commits:

`a25e286`	`trac #17013: new class WordDatatype_char`
`398b33c`	`trac #17013: added is_square in WordDatatype_char`
`a563343`	`trac #17013: doctests improvements + bug fix`
`9008b8d`	`Trac #17021: Faster creation of words by the parent.`

seblabbe commented 10 years ago

Branch: u/slabbe/17021

seblabbe commented 10 years ago

Commit: 9008b8d

seblabbe commented 10 years ago

comment:2

I based my branch on top of #17013 who just got positive reviewed. I hope it is okay.

videlec commented 10 years ago

comment:3

Hello,

1) what is the point of

+        word_classes = self._element_classes
+        cls_str = 'FiniteWord_list'
+        return word_classes[cls_str](parent=self, data=data)

instead of

+        return self._element_classes['FiniteWord_list'](parent, data)

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes). Instead of using directly those, I would rather introduce a check argument in __call__ and do not use them in methods of FiniteWord.

Vincent

seblabbe commented 10 years ago

comment:4

1) what is the point of

You are right, I will change that.

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad.

Okay, I see and agree.

But still __call__ is much slower because it guesses the datatype. I will check with a check argument whether it is better...

videlec commented 10 years ago

comment:5

Replying to @seblabbe:

1) what is the point of

You are right, I will change that.

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad.

Okay, I see and agree.

But still __call__ is much slower because it guesses the datatype. I will check with a check argument whether it is better...

Not if datatype is properly set in input, no? You can assume that with a check=False, if the datatype is provided then there is no need to check that the data fits.

Vincent

seblabbe commented 10 years ago

comment:7

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes).

What do you think if we do the following (automatically use char instead of list)?

    @lazy_attribute
    def _element_classes(self):

        [...]

        # test whether or not we can use the class Finiteword_char
        if (self.alphabet().cardinality() <= 256 and
                all(isinstance(i, (int,Integer)) and 0 <= i < 256 for i in self.alphabet())):
            L = self.alphabet().list()
            if (all(L[i] < L[i+1] for i in range(len(L)-1)) and
                    all(self.cmp_letters(L[i],L[i+1]) == -1 for i in range(len(L)-1))):
                classes['FiniteWord_char'] = word.FiniteWord_char
+               classes['FiniteWord_list'] = word.FiniteWord_char

Well, maybe it is bad since we won't be able to compare char with list anymore...

videlec commented 10 years ago

comment:8

Replying to @seblabbe:

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes).

What do you think if we do the following (automatically use char instead of list)?

[...]

I already proposed that and you answered: "and what if I really want a Word_list?" If you want to go that way, we can even remove all the classes and always return Word_char whatever the input is. It is definitely the simplest but of course only concerns alphabet included in [0,255[.

Vincent

seblabbe commented 10 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,6 @@
 1. Move the code of `_construct_word` into `__call__` method.

-2. Create `finite_word_list` methods which are shortcuts for `__call__` when we know the datatype.
+2. Create `_word_from_list` methods which are shortcuts for `__call__` when we know the datatype.

 Here are some benchmarks:

@@ -9,10 +9,15 @@

sage: W = Words() sage: L = range(1000) -sage: %timeit W(L) -10000 loops, best of 3: 45.5 µs per loop -sage: %timeit W.finite_word_list(L) -1000000 loops, best of 3: 1.29 µs per loop +sage: %timeit W.call__old(L) # before +10000 loops, best of 3: 45.1 µs per loop + +sage: %timeit W(L) # after +10000 loops, best of 3: 17.1 µs per loop
+sage: %timeit W(L, check=False)
+100000 loops, best of 3: 2.21 µs per loop +sage: %timeit W._word_from_list(L)
+1000000 loops, best of 3: 1.33 µs per loop


 str:
@@ -20,33 +25,45 @@

sage: W = Words() sage: s = 'a'*1000 -sage: %timeit W(s) -10000 loops, best of 3: 45.2 µs per loop -sage: %timeit W.finite_word_str(s) -1000000 loops, best of 3: 1.27 µs per loop +sage: %timeit W.call__old(s) # before +10000 loops, best of 3: 45.7 µs per loop
+sage: %timeit W(s) # after +10000 loops, best of 3: 18.4 µs per loop
+sage: %timeit W(s, check=False)
+100000 loops, best of 3: 2.62 µs per loop +sage: %timeit W._word_from_str(s)
+1000000 loops, best of 3: 1.23 µs per loop


 tuple:

sage: t = tuple(L) -sage: %timeit W(t) -10000 loops, best of 3: 46 µs per loop -sage: %timeit W.finite_word_tuple(t) -1000000 loops, best of 3: 1.39 µs per loop +sage: %timeit W.call__old(t) # before +10000 loops, best of 3: 45.9 µs per loop +sage: %timeit W(t) # after +10000 loops, best of 3: 18.6 µs per loop +sage: %timeit W(t,check=False)
+100000 loops, best of 3: 3.25 µs per loop +sage: %timeit W._word_from_tuple(t)
+1000000 loops, best of 3: 1.3 µs per loop


-char (why creation is so longer than str, tuple or list?):
+char (here creation is longer than str, tuple or list because a copy of data is performed):

sage: W = Words([0,1,2]) sage: L = [0,1,1,2]*1000 sage: type(W(L)) <class 'sage.combinat.words.word.FiniteWord_char'> -sage: %timeit W(L) -1000 loops, best of 3: 336 µs per loop -sage: %timeit W.finite_word_char(L) -1000 loops, best of 3: 246 µs per loop +sage: %timeit W.call__old(L) # before +1000 loops, best of 3: 322 µs per loop +sage: %timeit W(L) # after +1000 loops, best of 3: 292 µs per loop +sage: %timeit W(L, check=False)
+1000 loops, best of 3: 248 µs per loop +sage: %timeit W._word_from_char(L)
+1000 loops, best of 3: 245 µs per loop


 BEFORE:
@@ -58,12 +75,12 @@
 Wall time: 4.68 s

-AFTER: +AFTER (about 20 times faster):

 sage: W = Words(range(5))
 sage: %time L = list(W.iterate_by_length(7))
-CPU times: user 940 ms, sys: 80.4 ms, total: 1.02 s
-Wall time: 1.02 s
+CPU times: user 235 ms, sys: 29.1 ms, total: 264 ms
+Wall time: 259 ms

videlec commented 10 years ago

comment:10

There is no new commit, is that ok?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 9008b8d to 3a26ce4

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

3a26ce4 Trac #17021: answered reviewer comments

videlec commented 10 years ago

comment:13

Hello,

Much cleaner than before! Great!

Are you sure this is worth it

sage: W = Words([0,1,2])
sage: l = [0,1,0]*100
sage: timeit("w = W(l, datatype='list', check=False)", number=2000)
2000 loops, best of 3: 2.17 µs per loop
sage: timeit("w = W._word_from_list(l)", number=2000)
2000 loops, best of 3: 1.55 µs per loop
sage: cls = W._element_classes['FiniteWord_list']
sage: timeit("w = cls(W,l)", number=2000)
2000 loops, best of 3: 717 ns per loop

Having 20% gain with a method that is twice slower than calling the class constructor looks useless. If you want speed, use the classes themselves.

The name _word_from_list_or_char is more than confusing. For me from is about the input not the output. I have nothing against a method _word_from_list which return whatever it wants, but I would expect it to raise an Error if the input is not a list.

Sometimes you use MyWordClass(parent=self, data=data) and sometimes MyWordClass(parent, data). Is there a reason why?

Vincent

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 3a26ce4 to 383f7e2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`ea2752d`	`Merge branch 't/17021_6.4.beta3' into t/17021`
`383f7e2`	`Trac #17021: Fixes according to reviewer comment #2`

seblabbe commented 10 years ago

comment:15

I kept _word_from_word, _word_from_iter and _word_from_callable because their code takes more than one line and the latter two are used two or three times in the __call__ method.

videlec commented 10 years ago

Changed branch from u/slabbe/17021 to u/vdelecroix/17021

videlec commented 10 years ago

Changed commit from 383f7e2 to 020d549

videlec commented 10 years ago

comment:16

Hi Sebastien,

I added a commit which removes some trailing whitespaces and allow to use Word_char when the input data is a tuple.

If you are happy with it, set to positive review.

Vincent

New commits:

020d549 trac #17021: trailing whitespaces + more Word_char

seblabbe commented 10 years ago

comment:17

okay!

seblabbe commented 10 years ago

Reviewer: Vincent Delecroix

vbraun commented 10 years ago

comment:18

sage -t --long --warn-long 62.5 src/sage/combinat/permutation.py  # 1 doctest failed
sage -t --long --warn-long 62.5 src/sage/combinat/parking_functions.py  # 62 doctests failed

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e3b6901 trac #17021: fix `__call__` for CombinatorialObject

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 020d549 to e3b6901

videlec commented 10 years ago

comment:20

Hi Sebastien,

So CombinatorialObject is still used as input ;-( I added a special case for it in the __call__.

Vincent

seblabbe commented 10 years ago

comment:21

import is_iterator ?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

df5da57 trac #17021: fix `__call__` for CombinatorialObject

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from e3b6901 to df5da57

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

7b5b64c trac #17021: fix `__call__` for CombinatorialObject

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from df5da57 to 7b5b64c

videlec commented 10 years ago

comment:25

Hey Sebastien,

Are you happy with that version? The tests in combinat pass at home.

Vincent

seblabbe commented 10 years ago

comment:26

Test in combinat pass here too. I was waiting for the buildbot. It says "success", but I can not check if all of tests in the sage library passed. Can you see it?

Sébastien

videlec commented 10 years ago

comment:27

Nope. But at least it pass where it failed before... I set to positive review.

I do not know whether the build bot runs the test or only try to compile and run.

Vincent

seblabbe commented 10 years ago

comment:28

I tried make doc-clean and sage -docbuild reference html and I get the following error:

...
[combinat ] writing output... [ 98%] sage/databases/oeis                                               
[combinat ] writing output... [ 98%] species                                                           
[combinat ] writing output... [ 99%] symmetric_functions                                               
[combinat ] writing output... [ 99%] tableaux                                                          
[combinat ] writing output... [100%] words                                                             
[combinat ] None:38: WARNING: citation not found: Schiffmann                                           
Error building the documentation.                                                                      
Traceback (most recent call last):                                                                     
  File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 1491, in <module>         
    getattr(get_builder(name), type)()                                                                 
  File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 503, in _wrapper          
    x.get(99999)                                                                                       
  File "/Users/slabbe/Applications/sage-git/local/lib/python/multiprocessing/pool.py", line 558, in get
    raise self._value                                                                                  
OSError: [combinat ] None:38: WARNING: citation not found: Schiffmann

But I get the same error on a clean version of sage-6.4.beta4. So it is not related to us. Also:

sage: search_src('Schiffman')      
algebras/hall_algebra.py:114:    See section 2.3 in [Schiffmann]_, and sections II.2 and III.3     
algebras/hall_algebra.py:211:    .. [Schiffmann] Oliver Schiffmann. *Lectures on Hall algebras*.   
algebras/hall_algebra.py:465:            corresponding to `\lambda`. See Lemma 2.8 in [Schiffmann]_
combinat/hall_polynomial.py:66:    in [Schiffmann]_):

The part on words did not raised any warning or errors. So still positive review.

vbraun commented 10 years ago

Changed branch from u/vdelecroix/17021 to 7b5b64c

sagemath / sage

Faster creation of words by the parent #17021