rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

Why is goal_sampling_mode='vae_prior' in Skew-Fits sawyer_push config? #118

Closed mseitzer closed 4 years ago

mseitzer commented 4 years ago

Hi,

I am wondering why the goal sampling mode for both the exploration goals and the relabeling goals are set to use the VAE prior instead of the distribution learned by Skew-Fit (i.e. custom_goal_sampler) in the Sawyer Push experiment. Would that not make it almost identical to RIG?

https://github.com/vitchyr/rlkit/blob/20ea0820eb89bddae7c6a5171038a005e472c3d0/examples/skewfit/sawyer_push.py#L67-L69

Thank you very much!

vitchyr commented 4 years ago

Skew-Fit modifies the data distribution used to train the VAE. Otherwise, the VAE is still a normal VAE, and so we sample goals by sampling from the VAE prior.

mseitzer commented 4 years ago

From the paper I got the understanding that next to modifying the data distribution for training the VAE, an equally important part of Skew-Fit is performing goal-directed exploration using the learned goal distribution q_\phi^G. Thus my question.

But I guess you can assume that the VAE prior kind of gives you the maximum entropy distribution for Pushing, as there the goal space is nicely bounded and axis aligned?

Just for my understanding, is using the VAE prior for this experiment mentioned in the paper somewhere? I guess in Table 3 it states that q_\phi^G is used for sampling goals.

Thanks for clarifying.