Slow inferring of mel on a fast GPU.

I am using tacotron-cli and waveglow-cli on Google Colab, and am experiencing slow mel inference with waveglow-cli. The following is the code I am executing and the resulting log with the inference duration in bold. Given the generation of the mel takes 0 seconds, I am wondering if the mel inference is occurring on the CPU vs GPU and whether this is a bug. If I run a different Colab where I install waveglow directly I do not experience this issue. Any inout would be appreciated.

# Create text containing phonetic transcription of: "The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak."
with open('/content/example/text.txt', 'w') as f:
  f.write('ð|ʌ|SIL0|n|ˈɔ|ɹ|θ|SIL0|w|ˈɪ|n|d|SIL0|ˈæ|n|d|SIL0|ð|ʌ|SIL0|s|ˈʌ|n|SIL0|w|ɝ|SIL0|d|ɪ|s|p|j|ˈu|t|ɪ|ŋ|SIL0|h|w|ˈɪ|t͡ʃ|SIL0|w|ˈɑ|z|SIL0|ð|ʌ|SIL0|s|t|ɹ|ˈɔ|ŋ|ɝ|,|SIL1|h|w|ˈɛ|n|SIL0|ʌ|SIL0|t|ɹ|ˈæ|v|ʌ|l|ɝ|SIL0|k|ˈeɪ|m|SIL0|ʌ|l|ˈɔ|ŋ|SIL0|ɹ|ˈæ|p|t|SIL0|ɪ|n|SIL0|ʌ|SIL0|w|ˈɔ|ɹ|m|SIL0|k|l|ˈoʊ|k|.|SIL2')

# Synthesize text to mel-spectrogram
!tacotron-cli synthesize \
  /content/example/checkpoint-tacotron.pt \
  /content/example/text.txt \
  --sep "|"

# Synthesize mel-spectrogram to wav
!waveglow-cli synthesize \
  /content/example/checkpoint-waveglow.pt \
  /content/example/text -o

# Resulting wav is written to: /content/example/text/1-1.npy.wav

And this is the log.

(DEBUG) Loading checkpoint... (DEBUG) Loading text. Inferring... Checkpoint learning rate was: 1e-05 Using random seed: 7827. (DEBUG) Speaker: Linda Johnson (sdp) Inference: 0%| | 0/1 [00:00<?, ? lines/s]Line 1: Skipped inference because line is already synthesized! Inference: 0%| | 0/1 [00:00<?, ? lines/s] Done. Total spectrogram duration: 0.00s Written output to: '/content/example/text' Everything was successful! Written log to: /tmp/tacotron-cli.log Using random seed: 4663. Loading model '/content/example/checkpoint-waveglow.pt'... Loaded model at iteration 580000. /usr/local/lib/python3.10/dist-packages/waveglow/model.py:36: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release. The boolean parameter 'some' has been replaced with a string parameter 'mode'. Q, R = torch.qr(A, some) should be replaced with Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2425.) W = torch.qr(torch.FloatTensor(c, c).normal_())[0] Inferring: 0%| | 0/1 00:00<?, ? mel(s)/s Loading mel from /content/example/text/1-1.npy ... (DEBUG) Inferring mel... (DEBUG) Saving /content/example/text/1-1.npy.wav ... Inferring: 100%|█████████████████████████████████████████████████| 1/1 [01:07<00:00, 67.64s/ mel(s)] Done. Written output to: /content/example/text Everything was successful! Written log to: /tmp/waveglow-cli.log

stefantaubert / waveglow

Slow inferring of mel on a fast GPU. #1