kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.57k stars 343 forks source link

RTF variation #299

Closed sciai-ai closed 3 years ago

sciai-ai commented 3 years ago

Hi @kan-bayashi

I have noticed that the RTF varies even when we use manual seeds for both taco2 and pwg. I am wondering where is this randomness coming from which makes the calculation of tensors slower or faster? #291

kan-bayashi commented 3 years ago

Hmm. I think it depends on your machine usage. Basically, the initial inference seems to be slow due to some CUDA initialization (maybe). How much different?

r9y9 commented 3 years ago

Seems like parallel-wavegan-decode is not force-syncronnize CUDA operations. You might want to use torch.cuda. synchronize before and after the inference. Peseudo code will look like:

torch.cuda.synchronize()
start = time.time()
with torch.no_grad():
    # do inference
torch.cuda.synchronize()
elapsed_time = time.time() - start

See https://pytorch.org/docs/stable/notes/cuda.html for details.

sciai-ai commented 3 years ago

It seems the variation is not so much for smaller input size but when its 200-300 chars the varaion can be upto 3 seconds. Using torch.cuda.synchronize() also makes no difference to the variation.

I have noticed another strange behaviour for timing, when i want to write the wav array to output wav file. Below code snippet increases the code run time by 3x compared to if we do not add the scipy write code. I tested for 1000 chars input. Can you please check? I used LJSpeech model from the espnetttsnotebook, also tried sf.write but same results

# write wav file 
start = time.time()
import scipy.io.wavfile
# synthesis
with torch.no_grad():
    start2 = time.time()
    wav, c, *_ = text2speech(x)
    wav = vocoder.inference(c)
    print((time.time() - start2))
wav2 = wav.view(-1).cpu().numpy()
scipy.io.wavfile.write('123.wav', 22050, wav2)
print((time.time() - start))

image

When i just execute the code separately the runtime is very quick

image

kan-bayashi commented 3 years ago

Maybe better to check the time without io.

sciai-ai commented 3 years ago

haha I know why, its not the io, it's the time taken to copy the tensor from GPU to CPU 😊