openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
9.95k stars 2.19k forks source link

[PyTorch] Is it faster to cast to tensor when inserting into buffer, or retrieving? #214

Open langfield opened 4 years ago

langfield commented 4 years ago

I noticed across many of the implementations of actor-critic policies, the Rollout/Buffer/Trajectories object is inconsistent, in that some authors send the arrays to device as tensors during insertion, and some authors cast to tensors during the get() function, as you have below in the vpg.py pytorch implementation:

    def get(self):
        """
        Call this at the end of an epoch to get all of the data from
        the buffer, with advantages appropriately normalized (shifted to have
        mean zero and std one). Also, resets some pointers in the buffer.
        """
        assert self.ptr == self.max_size    # buffer has to be full before you can get
        self.ptr, self.path_start_idx = 0, 0
        # the next two lines implement the advantage normalization trick
        adv_mean, adv_std = mpi_statistics_scalar(self.adv_buf)
        self.adv_buf = (self.adv_buf - adv_mean) / adv_std
        data = dict(obs=self.obs_buf, act=self.act_buf, ret=self.ret_buf,
                    adv=self.adv_buf, logp=self.logp_buf)
        return {k: torch.as_tensor(v, dtype=torch.float32) for k,v in data.items()}

Is there a reason why you chose to do it this way? Is it any faster than the way it is done here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/storage.py

Thanks.

joshua-dean commented 4 years ago

Although it is by no means a good or accurate measurement, I wrote the following to get a ballpark on the difference between the two (all cpu):

import torch 
import numpy as np 
import time 
import random 

if __name__ == "__main__":
    data_size = 100
    iters = 100
    torch_buf = torch.zeros(data_size, dtype=torch.float32)
    np_buf = np.zeros(data_size, dtype=np.float32)

    torch_start = time.time()
    for _ in range(iters):
        for idx in range(data_size):
            torch_buf[idx] = random.random()
    torch_end = time.time()

    np_start = time.time()
    for _ in range(iters):
        for idx in range(data_size):
            np_buf[idx] = random.random()
        x = torch.tensor(np_buf)
    np_end = time.time()

    print("Torch Time: {}s".format(torch_end - torch_start))
    print("NP Time: {}s".format(np_end - np_start))

which on my machine (time measurements are very arbitrary) indicates that the inserts into a numpy array followed by a single conversion to a tensor are 1-2 orders of magnitude faster than repeated inserts into a tensor. I imagine the tensor has additional operations/layers of abstraction relative to it's numpy counterpart, which at least makes sense conceptually.

This is by no means a good way to calculate things; I'm hoping to get a chance to accurately measure this with some sort of CPU counter. Let me know if there's any functional/conceptual errors with this code.

langfield commented 4 years ago

Looks okay to me at a glance. Thanks for looking into it. I'll be curious to see a more accurate measurement if you get around to implementing it, but this looks like a significant-enough difference to suggest the single/lazy conversion is better.