yding25 / GPT-Planner

Paper: Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds
https://cowplanning.github.io/
MIT License
25 stars 2 forks source link

Questions on Your Dataset #2

Open bowen-upenn opened 5 months ago

bowen-upenn commented 5 months ago

Thank you for publishing such an amazing work.

We noticed that your COWP dataset only includes textual descriptions of the tasks and possible situations. We are wondering if you used a vision system only in your robot demonstration to detect situations, but not in any of the benchmarks and result histograms of 12 different tasks mentioned in the paper? Thank you!

yding25 commented 5 months ago

Thank you for appreciating our work.

The COWP does not utilize a vision system; instead, it operates within the realm of natural language. We assume a perfect Visual Question Answering (VQA) model capable of accurately describing scenes, with these descriptions serving as the situation context.

We actually attempted to develop a vision system. However, unfortunately, few robotics simulation platforms support visualizing various scenarios, such as coffee spills or broken cups, and it also demands significant human effort.