Open johannes-dornheim opened 4 years ago
Hello,
are features for the full observation-dictionary
yes, this was a choice (to makes things simpler). Please tell us if it makes a difference.
after first experiments (w. DQN and my personal Env) it seems like it does not make much difference reg. the convergence speed. I wil come back when i have time to produce quantitative results.
for the experiments i added an extra dict (hash(obs,desired):achievedl) to enable the inversion of 'convert_dict_to_obs' . her/utils.py:
def __init__(self, env):
self.achieved_goals = {}
...
...
def convert_dict_to_obs(self, obs_dict):
"""
:param obs_dict: (dict<np.ndarray>)
:return: (np.ndarray)
"""
# achieved goals are stored in extra dict hash(obs,desired):achieved !
# Assumes (obs,desired)->achieved to be unique !
# --------------------------------------------------------------------------------------------
if isinstance(self.observation_space, spaces.MultiDiscrete):
# Special case for multidiscrete
obs = np.concatenate([[int(obs_dict[key])] for key in ['observation', 'desired_goal']])
else:
obs = np.concatenate([obs_dict[key] for key in ['observation', 'desired_goal']])
self.achieved_goals[hash(obs.data.tobytes())] = obs_dict['achieved_goal']
return obs
def convert_obs_to_dict(self, observations):
"""
Inverse operation of convert_dict_to_obs
:param observations: (np.ndarray)
:return: (OrderedDict<np.ndarray>)
"""
return OrderedDict([
('observation', observations[:self.obs_dim]),
('achieved_goal', self.achieved_goals[hash(observations.data.tobytes())]),
('desired_goal', observations[self.obs_dim:]),
])
Hi,
it seems like the state features that are added to the replay buffer in the HindsightExperienceReplayWrapper are features for the full observation-dictionary (observation, achieved-goal, desired-goal).
https://github.com/hill-a/stable-baselines/blob/cfcdb2fc79260b25aa0c32faf4cf5d551f166f29/stable_baselines/her/replay_buffer.py#L153 https://github.com/hill-a/stable-baselines/blob/cfcdb2fc79260b25aa0c32faf4cf5d551f166f29/stable_baselines/her/replay_buffer.py#L184
the feature-vectors 'obs_t' and 'obs_tp1' are generated by:
from the full (observation, achieved-goal, desired-goal) observation-dictionaries.
Imho, the 'achieved-goal' part is wrong. As you can see in the original publication of HER ([1], Listing at P5) the 'goal-state' is represented by (observation, desired-goal)-features only. In most cases the extra-features should affect the convergence of the underlying model. In my current application (goals are described by ~80 variables) it may lead to convergence problems (experiments are pending)
[1] https://arxiv.org/abs/1707.01495