soroushmehr / sampleRNN_ICLR2017

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
https://arxiv.org/abs/1612.07837
MIT License
537 stars 140 forks source link

speed of generating speech samples #19

Open dengyan opened 7 years ago

dengyan commented 7 years ago

I found that SampleRNN need to be run in parallel to get fast generation speed. It takes only about 500 seconds for generating 200 utterances, each with a length of 8 seconds speech. But it will be very time costing if only run one sentence in generation, more than 40 seconds for 1 second speech. It seems it's not faster than Wavenet. Does anyone have some ideas on speeding up it?

Cortexelus commented 6 years ago

Using a p3x16large AWS instance NVIDIA Tesla V100 CUDA 9

This appears to run 10x the speed of dengyan's setup.

It takes us 1000 seconds to generate 4 minute audio files.

If we generate 100 of these in parallel that's 24 seconds of generative audio for every 1 second of processing

If we generate 1 of these: That's 0.24 seconds of generative audio for every 1 second of processing