Hi, thanks for this great work and after a long 、tough but exciting trail, we finally reproduce some good results on our own robot and task. Meanwhile some questions come to our mind:
When we just fine-tune just one task(pick a cup, 100 demos), the performance is pretty well. But when we tried fine-tuning on 4 tasks(pick a cup, place a cup, wipe a dirty area, insert a block, 100demos for each) together, the performance is not well as before. So what is the strategy and keypoints to fine-tune multi-tasks(such as data ratio, task relationship, et. al)?
As mentioned there are three fine-tuning methods: head only, head map only, and full. Currently we only use the "full" method. What are the three methods suitable for? What are the advantages of each one?
In many cases, the robot data used for fine-tuning is completely different from the data used for pre-training. In this case, what benefits does pre-training bring? Or would it be better to train fully with only new data?
What does the model learn from large dataset pre-training? Such as scene representation or something else?
As a general manipulation model, can it cope effectively with the differences between different robot bodies? What are some ideas to solve this discrepancy?
Thanks for your attention and always keep looking forward to your kind response!
Hi, thanks for this great work and after a long 、tough but exciting trail, we finally reproduce some good results on our own robot and task. Meanwhile some questions come to our mind:
When we just fine-tune just one task(pick a cup, 100 demos), the performance is pretty well. But when we tried fine-tuning on 4 tasks(pick a cup, place a cup, wipe a dirty area, insert a block, 100demos for each) together, the performance is not well as before. So what is the strategy and keypoints to fine-tune multi-tasks(such as data ratio, task relationship, et. al)?
As mentioned there are three fine-tuning methods: head only, head map only, and full. Currently we only use the "full" method. What are the three methods suitable for? What are the advantages of each one?
In many cases, the robot data used for fine-tuning is completely different from the data used for pre-training. In this case, what benefits does pre-training bring? Or would it be better to train fully with only new data?
What does the model learn from large dataset pre-training? Such as scene representation or something else?
As a general manipulation model, can it cope effectively with the differences between different robot bodies? What are some ideas to solve this discrepancy?
Thanks for your attention and always keep looking forward to your kind response!