myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.91k stars 637 forks source link

cuFFT error: CUFFT_INTERNAL_ERROR from training script #80

Open johnPertoft opened 8 months ago

johnPertoft commented 8 months ago

I'm trying to run the provided training script but I'm running into the aforementioned problem. It happens in a call to torch.stft(...) in melo/mel_processing.py:mel_spectrogram_torch(...). All tensors going into this function seem to be on the correct device. Have you seen this error before and if so, are there any known fixes?

System:

Verified that torch can talk to gpu:

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"
1.13.1+cu117
True
11.7

config.json:

Show ```json { "train": { "log_interval": 200, "eval_interval": 1000, "seed": 52, "epochs": 10, "learning_rate": 0.0003, "betas": [ 0.8, 0.99 ], "eps": 1e-09, "batch_size": 16, "fp16_run": false, "lr_decay": 0.999875, "segment_size": 16384, "init_lr_ratio": 1, "warmup_epochs": 0, "c_mel": 45, "c_kl": 1.0, "skip_optimizer": true }, "data": { "training_files": "../../data/June-Restrained/train.list", "validation_files": "../../data/June-Restrained/val.list", "max_wav_value": 32768.0, "sampling_rate": 44100, "filter_length": 2048, "hop_length": 512, "win_length": 2048, "n_mel_channels": 128, "mel_fmin": 0.0, "mel_fmax": null, "add_blank": true, "n_speakers": 1, "cleaned_text": true, "spk2id": { "June": 0 } }, "model": { "use_spk_conditioned_encoder": true, "use_noise_scaled_mas": true, "use_mel_posterior_encoder": false, "use_duration_discriminator": true, "inter_channels": 192, "hidden_channels": 192, "filter_channels": 768, "n_heads": 2, "n_layers": 6, "n_layers_trans_flow": 3, "kernel_size": 3, "p_dropout": 0.1, "resblock": "1", "resblock_kernel_sizes": [ 3, 7, 11 ], "resblock_dilation_sizes": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "upsample_rates": [ 8, 8, 2, 2, 2 ], "upsample_initial_channel": 512, "upsample_kernel_sizes": [ 16, 16, 8, 2, 2 ], "n_layers_q": 3, "use_spectral_norm": false, "gin_channels": 256 }, "num_languages": 8, "num_tones": 16, "symbols": [ "_", "\"", "(", ")", "*", "/", ":", "AA", "E", "EE", "En", "N", "OO", "Q", "V", "[", "\\", "]", "^", "a", "a:", "aa", "ae", "ah", "ai", "an", "ang", "ao", "aw", "ay", "b", "by", "c", "ch", "d", "dh", "dy", "e", "e:", "eh", "ei", "en", "eng", "er", "ey", "f", "g", "gy", "h", "hh", "hy", "i", "i0", "i:", "ia", "ian", "iang", "iao", "ie", "ih", "in", "ing", "iong", "ir", "iu", "iy", "j", "jh", "k", "ky", "l", "m", "my", "n", "ng", "ny", "o", "o:", "ong", "ou", "ow", "oy", "p", "py", "q", "r", "ry", "s", "sh", "t", "th", "ts", "ty", "u", "u:", "ua", "uai", "uan", "uang", "uh", "ui", "un", "uo", "uw", "v", "van", "ve", "vn", "w", "x", "y", "z", "zh", "zy", "~", "æ", "ç", "ð", "ø", "ŋ", "œ", "ɐ", "ɑ", "ɒ", "ɔ", "ɕ", "ə", "ɛ", "ɜ", "ɡ", "ɣ", "ɥ", "ɦ", "ɪ", "ɫ", "ɬ", "ɭ", "ɯ", "ɲ", "ɵ", "ɸ", "ɹ", "ɾ", "ʁ", "ʃ", "ʊ", "ʌ", "ʎ", "ʏ", "ʑ", "ʒ", "ʝ", "ʲ", "ˈ", "ˌ", "ː", "̃", "̩", "β", "θ", "ᄀ", "ᄁ", "ᄂ", "ᄃ", "ᄄ", "ᄅ", "ᄆ", "ᄇ", "ᄈ", "ᄉ", "ᄊ", "ᄋ", "ᄌ", "ᄍ", "ᄎ", "ᄏ", "ᄐ", "ᄑ", "ᄒ", "ᅡ", "ᅢ", "ᅣ", "ᅤ", "ᅥ", "ᅦ", "ᅧ", "ᅨ", "ᅩ", "ᅪ", "ᅫ", "ᅬ", "ᅭ", "ᅮ", "ᅯ", "ᅰ", "ᅱ", "ᅲ", "ᅳ", "ᅴ", "ᅵ", "ᆨ", "ᆫ", "ᆮ", "ᆯ", "ᆷ", "ᆸ", "ᆼ", "ㄸ", "!", "?", "…", ",", ".", "'", "-", "¿", "¡", "SP", "UNK" ] } ```

Pip freeze:

Show ``` absl-py==2.1.0 aiofiles==23.2.1 altair==5.2.0 annotated-types==0.6.0 anyascii==0.3.2 anyio==4.3.0 asttokens==2.4.1 attrs==23.2.0 audioread==3.0.1 Babel==2.14.0 boto3==1.34.67 botocore==1.34.67 cached-path==1.6.2 cachetools==5.3.3 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cn2an==0.5.22 colorama==0.4.6 contourpy==1.2.0 cycler==0.12.1 dateparser==1.1.8 decorator==5.1.1 Deprecated==1.2.14 Distance==0.1.3 distro==1.9.0 docopt==0.6.2 eng-to-ipa==0.0.2 exceptiongroup==1.2.0 executing==2.0.1 fastapi==0.110.0 fastcore==1.5.29 ffmpy==0.3.2 filelock==3.13.1 fonttools==4.50.0 fsspec==2024.3.1 fugashi==1.3.0 g2p-en==2.1.0 g2pkk==0.1.2 google-api-core==2.17.1 google-auth==2.29.0 google-cloud-core==2.4.1 google-cloud-storage==2.16.0 google-crc32c==1.5.0 google-resumable-media==2.7.0 googleapis-common-protos==1.63.0 gradio-client==0.13.0 gradio==4.22.0 grpcio==1.62.1 gruut-ipa==0.13.0 gruut-lang-de==2.0.0 gruut-lang-en==2.0.0 gruut-lang-es==2.0.0 gruut-lang-fr==2.0.2 gruut==2.2.3 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 huggingface-hub==0.21.4 idna==3.6 importlib-metadata==7.1.0 importlib-resources==6.4.0 inflect==7.0.0 ipython==8.18.1 jaconv==0.3.4 jamo==0.4.1 jedi==0.19.1 jieba==0.42.1 Jinja2==3.1.3 jmespath==1.0.1 joblib==1.3.2 jsonlines==1.2.0 jsonschema-specifications==2023.12.1 jsonschema==4.21.1 kiwisolver==1.4.5 langid==1.1.6 librosa==0.9.1 llvmlite==0.42.0 loguru==0.7.2 lovely-numpy==0.2.11 lovely-tensors==0.1.15 markdown-it-py==3.0.0 Markdown==3.6 MarkupSafe==2.1.5 matplotlib-inline==0.1.6 matplotlib==3.8.3 mdurl==0.1.2 mecab-python3==1.0.5 melotts @ file:///workspaces/tts-finetuning/MeloTTS networkx==2.8.8 nltk==3.8.1 num2words==0.5.12 numba==0.59.1 numpy==1.26.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 openai==1.14.2 orjson==3.9.15 packaging==24.0 pandas==2.2.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 pip==24.0 plac==1.4.3 platformdirs==4.2.0 pooch==1.8.1 proces==0.1.7 prompt-toolkit==3.0.43 protobuf==4.25.3 ptyprocess==0.7.0 pure-eval==0.2.2 pyasn1-modules==0.3.0 pyasn1==0.5.1 pycparser==2.21 pydantic-core==2.16.3 pydantic==2.6.4 pydub==0.25.1 Pygments==2.17.2 pykakasi==2.2.1 pyparsing==3.1.2 pypinyin==0.50.0 python-crfsuite==0.9.10 python-dateutil==2.9.0.post0 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 referencing==0.34.0 regex==2023.12.25 requests==2.31.0 resampy==0.4.3 rich==13.7.1 rpds-py==0.18.0 rsa==4.9 ruff==0.3.3 s3transfer==0.10.1 scikit-learn==1.4.1.post1 scipy==1.12.0 semantic-version==2.10.0 setuptools==69.2.0 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soundfile==0.12.1 stack-data==0.6.3 starlette==0.36.3 tensorboard-data-server==0.7.2 tensorboard==2.16.2 threadpoolctl==3.4.0 tokenizers==0.13.3 tomlkit==0.12.0 toolz==0.12.1 torch==1.13.1 torchaudio==0.13.1 tqdm==4.66.2 traitlets==5.14.2 transformers==4.27.4 txtsplit==1.0.0 typer==0.9.0 typing-extensions==4.10.0 tzdata==2024.1 tzlocal==5.2 Unidecode==1.3.7 unidic-lite==1.0.8 unidic==1.1.0 urllib3==1.26.18 uvicorn==0.29.0 wasabi==0.10.1 wcwidth==0.2.13 websockets==11.0.3 Werkzeug==3.0.1 wheel==0.43.0 wrapt==1.16.0 zipp==3.18.1 ```
AngelGuevara7 commented 8 months ago

In my case, the training script is working with torch 2.0.1+cu118, nvidia drivers 530 and rtx 4090. Have you tried changing torch version?

johnPertoft commented 8 months ago

I used the version pinned by the authors but I will try upgrading then. Thanks!

johnPertoft commented 7 months ago

Just updating on this, it does seem to work fine with the latest pytorch version. But would be good if the authors could weigh in on whether the torch<2.0 listed in the requirements is intended and if there are any implications of using a 2+ version of torch.

@Zengyi-Qin