xukechun / Vision-Language-Grasping

[ICRA 2023] A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
73 stars 8 forks source link

On the question of language instructions #8

Closed yang20230315 closed 6 months ago

yang20230315 commented 6 months ago

I would like to know if the language instruction part needs training? If so, can you tell me about that part of the paper?

xukechun commented 6 months ago

Hi, do you mean the part of language instruction encoding? We use the pretrained CLIP to encode text.

yang20230315 commented 6 months ago

Yes, I see, thanks