rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.49k stars 553 forks source link

Data overwritten in multitask_rollout #98

Closed HeinzBenjamin closed 4 years ago

HeinzBenjamin commented 4 years ago

Hello there,

I'm not 100% sure but I might have found some undesirable behavior in your rollout_functions. Specifically when recording HER observations from multitask rollouts as dictionaries, it seems that in the function multitask_rollout new dictionaries are appended by reference rather than making a copy. Thus when a new observation is added it replaces all previous dictionaries instead of really adding new info. I just noticed that after a long training session, all observation dictionaries in my replay buffer contained the exact same info for thousands of samples. Now I altered some things in my copy of rlkit and I'm not entirely sure if the original rlkit version behaves the same. But as the code doesn't use a copy mechanism, it might be worth having a look for you?

In the meantime I'm using this workaround:

next_observations.append(copy.deepcopy(next_obs))
dict_next_obs.append((copy.deepcopy(next_obs))

Best Ben

vitchyr commented 4 years ago

That seems rather worrisome! Are you using a custom environment? This code was mostly tested using gym and multiworld environments, so it assumes that the environment returns new arrays for the observations. The main reason we avoided this deepcopy is that it can sometimes be expensive (e.g. if there are images that are returned in the observation dict, but that aren't used), but perhaps it's better to be safer and always copy.

HeinzBenjamin commented 4 years ago

Thanks for the quick reply. Yes I'm using a custom environment that doesn't perform copies. So this is more of an issue with my env class then. Sorry for the confusion and thanks for the great tool!

vitchyr commented 4 years ago

Thanks for making this issue! I'll close and pin this issue for increased visibility in case other people run into this issue.

anair13 commented 2 years ago

This was a common source of bugs and we ended up doing the deepcopy() solution. Unpinning now