Thank you for posting such a great piece!
I encountered some confusion. When constructing training data, I notice the output is the entire trajectory. But in the inference phase, the model only predicts action path or thought or action at a time. The training data is not divided into steps to output the different steps. So how does the model achieve step-by-step prediction?
looking forward to your response! thanks
Thank you for posting such a great piece! I encountered some confusion. When constructing training data, I notice the output is the entire trajectory. But in the inference phase, the model only predicts action path or thought or action at a time. The training data is not divided into steps to output the different steps. So how does the model achieve step-by-step prediction? looking forward to your response! thanks