Open johnPertoft opened 8 months ago
In my case, the training script is working with torch 2.0.1+cu118, nvidia drivers 530 and rtx 4090. Have you tried changing torch version?
I used the version pinned by the authors but I will try upgrading then. Thanks!
Just updating on this, it does seem to work fine with the latest pytorch version. But would be good if the authors could weigh in on whether the torch<2.0
listed in the requirements is intended and if there are any implications of using a 2+ version of torch.
@Zengyi-Qin
I'm trying to run the provided training script but I'm running into the aforementioned problem. It happens in a call to
torch.stft(...)
inmelo/mel_processing.py:mel_spectrogram_torch(...)
. All tensors going into this function seem to be on the correct device. Have you seen this error before and if so, are there any known fixes?System:
Verified that torch can talk to gpu:
config.json:
Show
```json { "train": { "log_interval": 200, "eval_interval": 1000, "seed": 52, "epochs": 10, "learning_rate": 0.0003, "betas": [ 0.8, 0.99 ], "eps": 1e-09, "batch_size": 16, "fp16_run": false, "lr_decay": 0.999875, "segment_size": 16384, "init_lr_ratio": 1, "warmup_epochs": 0, "c_mel": 45, "c_kl": 1.0, "skip_optimizer": true }, "data": { "training_files": "../../data/June-Restrained/train.list", "validation_files": "../../data/June-Restrained/val.list", "max_wav_value": 32768.0, "sampling_rate": 44100, "filter_length": 2048, "hop_length": 512, "win_length": 2048, "n_mel_channels": 128, "mel_fmin": 0.0, "mel_fmax": null, "add_blank": true, "n_speakers": 1, "cleaned_text": true, "spk2id": { "June": 0 } }, "model": { "use_spk_conditioned_encoder": true, "use_noise_scaled_mas": true, "use_mel_posterior_encoder": false, "use_duration_discriminator": true, "inter_channels": 192, "hidden_channels": 192, "filter_channels": 768, "n_heads": 2, "n_layers": 6, "n_layers_trans_flow": 3, "kernel_size": 3, "p_dropout": 0.1, "resblock": "1", "resblock_kernel_sizes": [ 3, 7, 11 ], "resblock_dilation_sizes": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "upsample_rates": [ 8, 8, 2, 2, 2 ], "upsample_initial_channel": 512, "upsample_kernel_sizes": [ 16, 16, 8, 2, 2 ], "n_layers_q": 3, "use_spectral_norm": false, "gin_channels": 256 }, "num_languages": 8, "num_tones": 16, "symbols": [ "_", "\"", "(", ")", "*", "/", ":", "AA", "E", "EE", "En", "N", "OO", "Q", "V", "[", "\\", "]", "^", "a", "a:", "aa", "ae", "ah", "ai", "an", "ang", "ao", "aw", "ay", "b", "by", "c", "ch", "d", "dh", "dy", "e", "e:", "eh", "ei", "en", "eng", "er", "ey", "f", "g", "gy", "h", "hh", "hy", "i", "i0", "i:", "ia", "ian", "iang", "iao", "ie", "ih", "in", "ing", "iong", "ir", "iu", "iy", "j", "jh", "k", "ky", "l", "m", "my", "n", "ng", "ny", "o", "o:", "ong", "ou", "ow", "oy", "p", "py", "q", "r", "ry", "s", "sh", "t", "th", "ts", "ty", "u", "u:", "ua", "uai", "uan", "uang", "uh", "ui", "un", "uo", "uw", "v", "van", "ve", "vn", "w", "x", "y", "z", "zh", "zy", "~", "æ", "ç", "ð", "ø", "ŋ", "œ", "ɐ", "ɑ", "ɒ", "ɔ", "ɕ", "ə", "ɛ", "ɜ", "ɡ", "ɣ", "ɥ", "ɦ", "ɪ", "ɫ", "ɬ", "ɭ", "ɯ", "ɲ", "ɵ", "ɸ", "ɹ", "ɾ", "ʁ", "ʃ", "ʊ", "ʌ", "ʎ", "ʏ", "ʑ", "ʒ", "ʝ", "ʲ", "ˈ", "ˌ", "ː", "̃", "̩", "β", "θ", "ᄀ", "ᄁ", "ᄂ", "ᄃ", "ᄄ", "ᄅ", "ᄆ", "ᄇ", "ᄈ", "ᄉ", "ᄊ", "ᄋ", "ᄌ", "ᄍ", "ᄎ", "ᄏ", "ᄐ", "ᄑ", "ᄒ", "ᅡ", "ᅢ", "ᅣ", "ᅤ", "ᅥ", "ᅦ", "ᅧ", "ᅨ", "ᅩ", "ᅪ", "ᅫ", "ᅬ", "ᅭ", "ᅮ", "ᅯ", "ᅰ", "ᅱ", "ᅲ", "ᅳ", "ᅴ", "ᅵ", "ᆨ", "ᆫ", "ᆮ", "ᆯ", "ᆷ", "ᆸ", "ᆼ", "ㄸ", "!", "?", "…", ",", ".", "'", "-", "¿", "¡", "SP", "UNK" ] } ```Pip freeze:
Show
``` absl-py==2.1.0 aiofiles==23.2.1 altair==5.2.0 annotated-types==0.6.0 anyascii==0.3.2 anyio==4.3.0 asttokens==2.4.1 attrs==23.2.0 audioread==3.0.1 Babel==2.14.0 boto3==1.34.67 botocore==1.34.67 cached-path==1.6.2 cachetools==5.3.3 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cn2an==0.5.22 colorama==0.4.6 contourpy==1.2.0 cycler==0.12.1 dateparser==1.1.8 decorator==5.1.1 Deprecated==1.2.14 Distance==0.1.3 distro==1.9.0 docopt==0.6.2 eng-to-ipa==0.0.2 exceptiongroup==1.2.0 executing==2.0.1 fastapi==0.110.0 fastcore==1.5.29 ffmpy==0.3.2 filelock==3.13.1 fonttools==4.50.0 fsspec==2024.3.1 fugashi==1.3.0 g2p-en==2.1.0 g2pkk==0.1.2 google-api-core==2.17.1 google-auth==2.29.0 google-cloud-core==2.4.1 google-cloud-storage==2.16.0 google-crc32c==1.5.0 google-resumable-media==2.7.0 googleapis-common-protos==1.63.0 gradio-client==0.13.0 gradio==4.22.0 grpcio==1.62.1 gruut-ipa==0.13.0 gruut-lang-de==2.0.0 gruut-lang-en==2.0.0 gruut-lang-es==2.0.0 gruut-lang-fr==2.0.2 gruut==2.2.3 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 huggingface-hub==0.21.4 idna==3.6 importlib-metadata==7.1.0 importlib-resources==6.4.0 inflect==7.0.0 ipython==8.18.1 jaconv==0.3.4 jamo==0.4.1 jedi==0.19.1 jieba==0.42.1 Jinja2==3.1.3 jmespath==1.0.1 joblib==1.3.2 jsonlines==1.2.0 jsonschema-specifications==2023.12.1 jsonschema==4.21.1 kiwisolver==1.4.5 langid==1.1.6 librosa==0.9.1 llvmlite==0.42.0 loguru==0.7.2 lovely-numpy==0.2.11 lovely-tensors==0.1.15 markdown-it-py==3.0.0 Markdown==3.6 MarkupSafe==2.1.5 matplotlib-inline==0.1.6 matplotlib==3.8.3 mdurl==0.1.2 mecab-python3==1.0.5 melotts @ file:///workspaces/tts-finetuning/MeloTTS networkx==2.8.8 nltk==3.8.1 num2words==0.5.12 numba==0.59.1 numpy==1.26.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 openai==1.14.2 orjson==3.9.15 packaging==24.0 pandas==2.2.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 pip==24.0 plac==1.4.3 platformdirs==4.2.0 pooch==1.8.1 proces==0.1.7 prompt-toolkit==3.0.43 protobuf==4.25.3 ptyprocess==0.7.0 pure-eval==0.2.2 pyasn1-modules==0.3.0 pyasn1==0.5.1 pycparser==2.21 pydantic-core==2.16.3 pydantic==2.6.4 pydub==0.25.1 Pygments==2.17.2 pykakasi==2.2.1 pyparsing==3.1.2 pypinyin==0.50.0 python-crfsuite==0.9.10 python-dateutil==2.9.0.post0 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 referencing==0.34.0 regex==2023.12.25 requests==2.31.0 resampy==0.4.3 rich==13.7.1 rpds-py==0.18.0 rsa==4.9 ruff==0.3.3 s3transfer==0.10.1 scikit-learn==1.4.1.post1 scipy==1.12.0 semantic-version==2.10.0 setuptools==69.2.0 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soundfile==0.12.1 stack-data==0.6.3 starlette==0.36.3 tensorboard-data-server==0.7.2 tensorboard==2.16.2 threadpoolctl==3.4.0 tokenizers==0.13.3 tomlkit==0.12.0 toolz==0.12.1 torch==1.13.1 torchaudio==0.13.1 tqdm==4.66.2 traitlets==5.14.2 transformers==4.27.4 txtsplit==1.0.0 typer==0.9.0 typing-extensions==4.10.0 tzdata==2024.1 tzlocal==5.2 Unidecode==1.3.7 unidic-lite==1.0.8 unidic==1.1.0 urllib3==1.26.18 uvicorn==0.29.0 wasabi==0.10.1 wcwidth==0.2.13 websockets==11.0.3 Werkzeug==3.0.1 wheel==0.43.0 wrapt==1.16.0 zipp==3.18.1 ```