Normally in the inference process, we only provide the text to guide the generation, and the generated motion can contain the zero padding, since we add padding when training. My question is how can we remove the predicted padding in the generated motion?
Normally in the inference process, we only provide the text to guide the generation, and the generated motion can contain the zero padding, since we add padding when training. My question is how can we remove the predicted padding in the generated motion?