rougier / from-python-to-numpy

An open-access book on numpy vectorization techniques, Nicolas P. Rougier, 2017
http://www.labri.fr/perso/nrougier/from-python-to-numpy
Other
2.05k stars 340 forks source link

In Introduction 2.1 Simple Example - most benefit is not from vectorization #40

Closed reidfaiv closed 7 years ago

reidfaiv commented 7 years ago

In chapter "2.1 Simple example" you have example of "Vectorized approach" which leaves impression that most performance benefits come from itertools.accumulate(). This is not true - the main speed gain comes from use of random.choices() instead of random.randint() in previous sample.

>>> import random
>>> from timeit import timeit

>>> # original example
... def random_walk(n):
...     position = 0
...     walk = [position]
...     for i in range(n):
...         position += 2*random.randint(0, 1)-1
...         walk.append(position)
...     return walk
...
>>>
>>> timeit("random_walk(n=10000)", number=100, globals=globals())
1.7758303454734055
>>> # lets use random.choices() instead
... def random_walk_with_choices(n):
...     position = 0
...     steps = random.choices([1,-1], k=n)
...     walk = [position]
...     for step in steps:
...         position += step
...         walk.append(position)
...     return walk
...
>>> timeit("random_walk_with_choices(n=10000)", number=100, globals=globals())
0.39364298073223836
>>>
>>> # original random_walk_faster with accumulate
... def random_walk_faster(n=1000):
...   from itertools import accumulate
...   steps = random.choices([1,-1], k=n)
...   return list(accumulate(steps))
...
>>> timeit("random_walk_faster(n=10000)", number=100, globals=globals())
0.264255993680635

It is clearly visible that accumulate has minor effect on overall speed (which is mainly driven by itertools() native implementation).

As such example is clearly misleading.

Another minor bug with example: random_walk() returns N+1 elements while random_walk_faster() returns N elements.

len(random_walk(1000)) 1001 len(random_walk_faster(1000)) 1000

rougier commented 7 years ago

Thanks for the report. Actually, the explanation is given just after the example when I try to explain the gain come from the generation of all the steps at once. I will modify text to remove the confusion.

Also, I removed the choices function (in favor of sample) because it is only for Python 3.6.