homangab / Track-2-Act

code for the paper Predicting Point Tracks from Internet Videos enables Diverse Zero-Shot Manipulation
Other
53 stars 3 forks source link

How to get the goal image when doing evaluation? #3

Open hnuzhy opened 1 month ago

hnuzhy commented 1 month ago

Hi, thanks a lot for releasing the source code of your excellent work. After reading the paper, I found that it did not explain how can we obtain the goal image for a specifical task when running real robot evaluations. Is it not a necessary condition for the final closed-loop deployment when inferring the Residual Policy Correction network? Thus, it can be removed during inference. I'm looking forward to your reply.

homangab commented 1 month ago

Thanks for the question about goal image! In this work, we use the goal image to specify a task. For evaluation, we do need to obtain the goal image by intervening in the scene (i.e. if we want the robot to execute a task of opening the door, we need to open the door and obtain the image through the robot's camera). The idea is similar to any goal-conditioned policy learning (where goals can be images, language etc. - in this work, the goals are images)