katerakelly / oyster

Implementation of Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)
MIT License
472 stars 125 forks source link

A potential inplace operation bug in pytorch #30

Open Lagrant opened 2 years ago

Lagrant commented 2 years ago

It seems you used pytorch 1.0.1 in your project. Pytorch has added inplace detection since 1.5 which makes your code

def get_action(self, obs, deterministic=False):
        ''' sample action from the policy, conditioned on the task embedding '''
        z = self.z
        obs = ptu.from_numpy(obs[None])
        in_ = torch.cat([obs, z], dim=1)
        return self.policy.get_action(in_, deterministic=deterministic)

at https://github.com/katerakelly/oyster/blob/master/rlkit/torch/sac/agent.py throw an inplace operation error since the latent vector z is not detached from the action generation. It will cause policy loss function update z multiple times in one optimization which may cause some error s.

I think that might be a bug in your code and could affect the reliability on your final result.

Could you check on that?