Open Markus28 opened 3 years ago
Good catch @Markus28, I can raise a PR for it. Thanks!
I think you are right. We should change it so that we sample independently.
Great :) I suppose that instead of using np.random.sample
, the following would be more straight-forward:
def sample(self):
return np.random.uniform(low = self.low, high = self.high)
I have a second question which is tangentially related: The code in SubgoalActionSpace
appears to be specifically pertaining to the ant environments, is that correct? I am currently trying adapt your code to work with an entirely different environment and would be grateful for some pointers on whether this is the only part of the code that requires changes. Thanks!
For SubgoalActionSpace
, yea, you'd have to change the dim
and limits
.
(Check out the below line)
https://github.com/watakandai/hiro_pytorch/blob/master/hiro/hiro_utils.py#L109
For the environment, you'll need the .step()
function to return a tuple of obs
, r
, done
, and info
-- but obs
is a dict with the keys of 'desired_goal', 'observation', and 'achieved_goal'. This might be a little different than some Gym environments.
I've raised a small PR to allow a bit more modularity in subgoals: https://github.com/watakandai/hiro_pytorch/pull/5
While going through your code I came across a line that seemed odd to me: In hiro/hiro_utils.py:
although the point of the class isn't entirely clear to me, I would have expected this method to return a sample of a random vector with independent, (non-identically) uniformly distributed random variables.
However,
np.random.sample()
only returns a single float and the random variables in the vector will therefore not be independent.I would have expected the method to be something like this:
Did I misunderstand the point of
sample
or is this a mistake?