rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

More efficient future obs sample method in obs_dict_replay_buffer.py #112

Closed YangRui2015 closed 4 years ago

YangRui2015 commented 4 years ago

Hello, I'm reading your excellent code these days and found an interesting comment in obs_dict_replay_buffer.py. # This is generally faster than random.choice. Makes you wonder what random.choice is doing.

I tried and found that using np.random.random() is even faster than np.random.randint() when only sample one element randomly. My solution can be about 80 times faster:

## for example
In [4]: import numpy as np 
   ...: import time 
   ...:  
   ...: N = 1000 
   ...: L = np.random.randint(1, N, size=N) 
   ...: random_lis = [np.random.random(L[i]) for i in range(N)]    # generate a list of arrays of different length                                                                                                                               

In [5]: array_len = np.array([len(random_lis[i]) for i in range(N)])                                                                                                                             
# using np.random.random()
In [6]: %timeit random_idx = (np.random.random(N) * array_len).astype(np.int)                                                                                                                    
13 µs ± 419 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# using np.random.randint()
In [7]: %timeit random_idx = [np.random.randint(x) for x in array_len]                                                                                                                           
1e+03 µs ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In addition, I use list comprehension for better performance. The change is as follows:

The original code

future_obs_idxs = []
for i in indices[-num_future_goals:]:
    possible_future_obs_idxs = self._idx_to_future_obs_idx[i]
    # This is generally faster than random.choice. Makes you wonder what
    # random.choice is doing 
    num_options = len(possible_future_obs_idxs)
    next_obs_i = int(np.random.randint(0, num_options))
    future_obs_idxs.append(possible_future_obs_idxs[next_obs_i])
future_obs_idxs = np.array(future_obs_idxs)

my solution

future_indices = indices[-num_future_goals:]
possible_future_obs_lens = np.array([len(self._idx_to_future_obs_idx[i]) for i in future_indices])
next_obs_idxs = (np.random.random(num_future_goals) * possible_future_obs_lens).astype(np.int)
future_obs_idxs = np.array([self._idx_to_future_obs_idx[ids][next_obs_idxs[i]] for i, ids in enumerate(future_indices)]) 

Although the data collecting and training periods are more time-consuming, but I think rlkit can be more elegant and efficient.

Looking forward to your reply.

vitchyr commented 4 years ago

Thanks for this PR! I confirmed that using np.random.random is faster. That's a pretty neat trick! It's also quite clever since it allows you to sample from different uniform distributions in batch.

For reference:

In [34]: values = np.arange(10000)

In [35]: %timeit np.random.choice(values, size=1024, replace=False)
160 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [36]: %timeit np.random.randint(low=0, high=10000, size=1024)
22.1 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [37]: %timeit (np.random.random(1024) * 10000).astype(int)
11.2 µs ± 76.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

TODO: match the style to wrap at 80 line (will do after merging).

YangRui2015 commented 4 years ago
It's also quite clever since it allows you to sample from different uniform distributions in batch.

Yes, it is! :) Should I match the style now or you will do that later?

vitchyr commented 4 years ago

Done!

On Sun, Aug 9, 2020, 8:23 PM Rui notifications@github.com wrote:

It's also quite clever since it allows you to sample from different uniform distributions in batch.

Yes, it is! :) Should I match the style now or you will do that later?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/vitchyr/rlkit/pull/112#issuecomment-671146374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZMDZ5LAGHPN3TP5VFTR75R35ANCNFSM4PCP7ZSQ .

YangRui2015 commented 4 years ago

nice 🎉