Open tweigel-vaisala opened 7 years ago
Yes, you're right but it does as you noticed it does not make a real difference.
additionally,
in
def random_walk_faster(n=1000): from itertools import accumulate steps = random.sample((-1,+1)*n, k=n) return
[0]+list(accumulate(steps))
`
the random numbers are not drawn from the same distribution as in the other variants. And, in the timing you also measure the import costs of accumulate. However, there is no huge difference in computation time
Yes, it has be noticed in #37 but the proposed solution require Python 3.6. However, from the documentation, the proposed sampling method should do a shuffle.
Sorry, I didn't read the closed threads!
A list comprehension like
steps = [2* random.randint(0,1)-1 for _ in range(n)]
won't require python 3.6. You may even want to use a generator expression, which might be more efficient.
Oh nice, I like it. We need to benchmark it and if it faster or approximately the same, we can include it. Could you make a PR ?
Benchmarks
def random_walk(n):
position = 0
walk = [position]
for i in range(n):
position += 2*random.randint(0, 1)-1
walk.append(position)
return walk
def random_walk_faster(n=1000):
steps = random.sample((-1,+1)*n, k=n)
return [0]+list(accumulate(steps))
def random_walk_faster_2(n=1000):
steps = [2*random.randint(0,1)-1 for _ in range(n)]
return [0]+list(accumulate(steps))
def random_walk_faster_3(n=1000):
steps = (2*random.randint(0,1)-1 for _ in range(n))
return [0]+list(accumulate(steps))
%%timeit
random_walk(100000)
10 loops, best of 3: 141 ms per loop
%%timeit
random_walk_faster(100000)
10 loops, best of 3: 78.1 ms per loop
%%timeit
random_walk_faster_2(100000)
10 loops, best of 3: 135 ms per loop
%%timeit
random_walk_faster_3(100000)
10 loops, best of 3: 139 ms per loop
So it looks like, that using random.sample from a false distribution yields a huge speedup. The naive approach can compete with the list comprehension version.
So the problem here is essentially, that numpy improves two things. Sampling and aggregating. Maybe an example, that used the numpy sampling, but accumulate in aggregating might be a better solution. Unfortunately, the random.choices function only has been introduced in python 3.6. If it is faster than the list comprehension, using choices probably would be the best solution in terms of didactics
Sorry, in the end I see your PR but the problem now is that the proposed solution does not correspond to the text so it is not good. What exactly is wrong with the random.sample
method ? I think documentation says in some circumstances (k=len(x)) it is equivalent to a shuffle, no ?
Hey, that's what I feared. So why might the use of random.sample be a bad choice? Firstly, it is absolutely not clear why using sample will provide the same algorithm.
Secondly, you are reducing the variance/standard deviation by a huge amount. Though you approximate the same mean for large repetions, the distribution you'll use will be too narrow. This solution penalizes solutions that are far out too much.
I think, this is easiest seen for the case n=2. Using your sampling method: you find after the first draw a probability of 2/3 to move in the other direction. With the usual sampling the probability is still 1/2. Hence, your method, will have a probability of 2/3 to end in the center and probability 1/6 to end at two places left or right off the center. The other approach gives probabilities of 1/4, 1/2 and 1/4.
This difference in variance carries over to higher number of repetitions (and it is significant!)
So, one way would be to use np.randint() in all of the functions, maybe the best way would just be to comment on the issues with the variance and keep your solution.
The random_walk function is not a direct equivalent to the RandomWalker class method. A strict equivalent would be this:
It is still not much faster, but it's a more fair comparison.