rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.52k stars 553 forks source link

Final Distance in Push #126

Closed MianchuWang closed 3 years ago

MianchuWang commented 3 years ago

Hi, Vitchyr

When I use env.sample_goal() in the Push environment, it returns a dict that includes desired_goal . desired_goal is a 4-D array, where the first 2 numbers are the position of the hand and the last 2 numbers are the position of the puck.

When I use env.step(any_action) , the returned state is a dict that includes achieved_goal , achieved_goal has the same structure as the abovementioned desired_goal .

My question is Final_distance = hand_distance + puck_distance = Euclidean(achieved_goal[0:2], desired_goal[0: 2]) + Euclidean(achieved_goal[2:4] + desired_goal[2:4])

Is the equation correct?

I'm sorry that I didn't find a similar snippet in your implementation, so I ask you here.

Thank you

vitchyr commented 3 years ago

Hi Mianchu,

Which "final_distance" are you referring to? The env returns both the hand_distance and the puck_distance. If you're referring to the metric used in e.g. the Skew-Fit paper, we report just the puck distance, since the hand distance is pretty easy to optimize.

Vitchyr

MianchuWang commented 3 years ago

Hi Mianchu,

Which "final_distance" are you referring to? The env returns both the hand_distance and the puck_distance. If you're referring to the metric used in e.g. the Skew-Fit paper, we report just the puck distance, since the hand distance is pretty easy to optimize.

Vitchyr

Hi Vitchyr,

Thanks for your reply. I'm referring the "Final Distance to Goal" in RIG, like the y-axis in figure 3.

I'm sorry that I closed the issue by accident.

Mianchu

vitchyr commented 3 years ago

For RIG I believe we actually just reported Euclidean(achieved_goal, desired_goal) and didn't compute the sums separately. Qualitatively the results are the same.

MianchuWang commented 3 years ago

For RIG I believe we actually just reported Euclidean(achieved_goal, desired_goal) and didn't compute the sums separately. Qualitatively the results are the same.

Thank you, it solves my problem!