rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

Copy vs Deepcopy in SAC #74

Open richardrl opened 5 years ago

richardrl commented 5 years ago

I'm using a previous version of SAC (with separate Q and V value function) but noticed something strange.

there is this line in twin_sac.py: self.target_vf = vf.copy()

Changing it to a deepcopy results in very different training curves (see curves attached):

        from copy import deepcopy
        self.target_vf = deepcopy(vf)

vf.copy() is defined as such:

   def copy(self):
        copy = Serializable.clone(self)
        ptu.copy_model_params_from_to(self, copy)
        return copy

It looks like it should do essentially the same thing as deepcopy, so what's causing the difference..

deepcopy

vitchyr commented 5 years ago

Yeah, that's really odd. Can you check if deepcopy copies the weight values as well?

vitchyr commented 5 years ago

@richardrl Did you ever get around to seeing what's going on? Also, which curve is which?

richardrl commented 4 years ago

I have not figured out why this is happening yet. The .copy() is the orange curve that actually trains. This is just on picknplace with the fetch robotics task.

vitchyr commented 4 years ago

Did you check if deepcopy copies the weights over (as reference or as value)?