rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.57k stars 315 forks source link

Error running training colab #188

Closed StoryHack closed 8 months ago

StoryHack commented 9 months ago

I'm am using the colab training notebook for training/finetuning. I am using a multi-speaker dataset and finetuning starting from the lessac high quality checkpoint. Everything seems to go fine until I hit the step 5 of the notebook. when I run it, I get an error. full output below. None of the previous steps generated any kind of error.

DEBUG:piper_train:Namespace(dataset_dir='/content/drive/MyDrive/colab/piper/fourvoice', checkpoint_epochs=5, quality='high', resume_from_single_speaker_checkpoint='/content/pretrained.ckpt', logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=1000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=8515, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint=None, profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=12, validation_split=0.01, num_test_examples=2, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s)
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnmz93u78
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnmz93u78/_remote_module_non_sriptable.py
2023-08-28 19:15:03.739315: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-08-28 19:15:05.185396: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.10/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.xla_bridge:No jax_plugins namespace packages available
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/fourvoice/dataset.jsonl
DEBUG:piper_train:Resuming from single-speaker checkpoint: /content/pretrained.ckpt
DEBUG:fsspec.local:open file: /content/pretrained.ckpt
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/piper/src/python/piper_train/__main__.py", line 152, in <module>
    main()
  File "/content/piper/src/python/piper_train/__main__.py", line 107, in main
    model_single = VitsModel.load_from_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/saving.py", line 137, in load_from_checkpoint
    return _load_from_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/saving.py", line 205, in _load_from_checkpoint
    return _load_state(cls, checkpoint, strict=strict, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/saving.py", line 250, in _load_state
    obj = cls(**_cls_kwargs)
  File "/content/piper/src/python/piper_train/vits/lightning.py", line 80, in __init__
    self.writer = SummaryWriter(log_dir=os.path.dirname(str(dataset[0]))+"/lightning_logs")
TypeError: 'NoneType' object is not subscriptable
rmcpantoja commented 9 months ago

Hi @StoryHack, To finetune a multi-speaker dataset in a single-speaker model, you need to selec the action "convert single speaker to multi speaker model" instead of "finetune".

StoryHack commented 9 months ago

That is the one I had selected.

rmcpantoja commented 9 months ago

No, you don't selected that, because the log sais:

rmcpantoja commented 9 months ago

DEBUG:piper_train:Resuming from single-speaker checkpoint: /content/pretrained.ckpt

And should say someting like:

DEBUG:piper_train:Converting single speaker to multi/speaker model.

StoryHack commented 9 months ago

I had the correct option selected when I ran step 4. I'm not sure how it doesn't stay.

rmcpantoja commented 9 months ago

Maybe it is due to the dataset. Is this ljspeech format? Its structure is file|speaker_id|text?

StoryHack commented 9 months ago

This is how it looks now. Does the speaker_id need to be numerical?

...
./wavs/Karen_Savage_152.wav|Karen_Savage|Her complexion was quite faultless, much to her mother's satisfaction.
./wavs/Karen_Savage_153.wav|Karen_Savage|I'm so glad to have one daughter who can wear pink, Mrs. Blythe was wont to say jubilantly
./wavs/Karen_Savage_154.wav|Karen_Savage|Diana Blythe, known as Di, was very like her mother, with grey-green eyes that always shone with the peculiar lustre and brilliancy in the dusk, and red hair.
...
rmcpantoja commented 9 months ago

Maybe you should exclude "./".