pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.55k stars 655 forks source link

Add xfolding to tacotron2 infer pipeline #1918

Open mthrok opened 3 years ago

mthrok commented 3 years ago

In case of vocoding one example, by folding the input example into batch of chunks, the inference can run faster.

https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L167-L177

I excluded it from the initial tacotron2 pipeline, due to the https://github.com/pytorch/audio/issues/1742 we can re-implement this while resolving why #1742 was the case.

https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L32-L129

nateanl commented 3 years ago

Is this tacotron2 related? Or is the method only for WaveRNN?

mthrok commented 3 years ago

It's for wavernn but implemented in tts pipeline. In this class.

https://github.com/pytorch/audio/blob/56f3b92746022cad8bd20f23b7a92023fb5560cc/torchaudio/pipelines/_tts/impl.py#L71-L96