real-stanford / xskill

[CoRL 2023] XSkill: cross embodiment skill discovery
https://xskill.cs.columbia.edu/
MIT License
48 stars 9 forks source link

Skill prior loss and VisualMotionPrior #1

Closed LostXine closed 9 months ago

LostXine commented 9 months ago

Hello,

Thank you for sharing this amazing project. The organization and clarity of the code are truly commendable.

While reviewing the code, I encountered a point that wasn't entirely clear to me, and I was hoping you might be able to provide some additional insight.

In the file xskill/model/core.py, there is an additional loss termed as skill prior loss. This loss appears to be associated with the training of a separate CNN, along with a prototype layer encapsulated within a VisualMotionPrior instance.

https://github.com/real-stanford/xskill/blob/b748071daeb031d6b42a8dcb88c38c52297e20af/xskill/model/core.py#L301-L313

https://github.com/real-stanford/xskill/blob/b748071daeb031d6b42a8dcb88c38c52297e20af/xskill/model/core.py#L90-L98

The pretrained VisualMotionPrior is used to extract affordance_emb as shown in the following file, then it seems that such embedding is never used.

https://github.com/real-stanford/xskill/blob/b748071daeb031d6b42a8dcb88c38c52297e20af/scripts/label_sim_kitchen_dataset.py#L191

I might have overlooked any detailed explanation regarding this. If it's not too much trouble, could you kindly give some hints? If this is a deprecated implementation, would you like to share some observations regarding this config (if there is any) as well?

Thank you once again for your hard work.

Thanks,

mengdaxu commented 9 months ago

Hi,

Thanks for your kind words for this project. This is indeed a deprecated implementation. We did not use the skill prior loss to supervise any components used in the inference time during the training and the skill prior is not part of the XSkill system which can be completely removed from the code base. But to explain the motivation for implementing the skill prior, we originally wanted to use the learned prior to explore the possibility for skill acquiring through RL, i.e., learn the prior though human data only and the robot discover skill through RL. Hope this can clarify your confusion.

Thanks, Mengda

LostXine commented 9 months ago

Hi @mengdaxu ,

Thank you so much for your detailed explanation. Now I get your point and I really appreciate the story behind the scene.

Best regards,