Open SiyuanHuang95 opened 9 months ago
Thanks for your questions!
language_instruction
field in the dataset describes the full trajectory (which is the case for the data in the Open-X dataset). If you want sub-instructions (e.g. "skills") passed to the model and your dataset has annotations for that, you can add e.g. a one-hot encoding via the proprio input.Thanks for your update! I appreciate your informative answer! It deserves more stars!
Some new questions:
Hey, thanks for your work! I'll add a question about that appendix too: I've read you don't do temporal ensembling but does that mean that you execute the action in a chunk without re-planning or do you still compute actions on the new observation while the past chunk is still executing and execute those instead?
Hi, great work, and thanks for your sharing!
I have read your paper and got great inspiration from the papers. However, I still find something unclear:
The history frames. You mentioned that a one-frame history is beneficial for pre-training. Then how do you manage the input data, would that be like:----, e.g. the interleaved style? In other words, do we need to split all the original trajectories into 2-steps chunks? And how do you account for the first frame, and repeat it?
Shuffle buffer size. The buffer means the "sampling frame from different trajectories across datasets"? Please point it out if I understand wrongly.
Heads. It seems that the diffusion policy head is the most robust and efficient one.
Step Modelling. I am curious about how to model the step information. Since one trajectory has one task instruction with many steps, do we need to differentiate the differences between steps? Also, how do you decide that the robot should stop? Using some heuristics?
Thanks again for your sharing!