r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 484 forks source link

Another Assertion error #41

Closed homink closed 6 years ago

homink commented 6 years ago

Hi again,

I trained single Korean speaker successfully and moving to multiple Korean speaker. Again, I encountered such Assertion error as shown below. I tracked down and looks like self.encoder in AttentionSeq2Seq class gave such error messages. Could you let me know where the following self.encoder function is defined so that I can look into further? max_position doesn't work this time.

encoder_outputs = self.encoder( text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)

Thanks in advance,

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch2]$ CUDA_VISIBLE_DEVICES=2 python train.py   --data-root=./data/nikl_m/   --hparams="frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker"   --checkpoint-dir checkpoint_nikl_m
Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoint_nikl_m',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './data/nikl_m/',
 '--help': False,
 '--hparams': 'frontend=ko,builder=deepvoice3,preset=deepvoice3_niklm,builder=deepvoice3_multispeaker',
 '--load-embedding': None,
 '--log-event-path': None,
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: False
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: deepvoice3_multispeaker
  checkpoint_interval: 10000
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 256
  eval_interval: 10000
  fft_size: 1024
  fmax: 7600
  fmin: 125
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: ko
  guided_attention_sigma: 0.2
  hop_size: 256
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  key_projection: False
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 512
  min_level_db: -100
  n_speakers: 1
  name: deepvoice3
  nepochs: 10000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  preset: deepvoice3_niklm
  presets: {'deepvoice3_niklm': {'n_speakers': 119, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 3000, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 600, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'encoder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input': True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
  priority_freq: 3000
  priority_freq_weight: 0.0
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  rescaling: False
  rescaling_max: 0.999
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.01
  text_embed_dim: 256
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: False
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Override hyper parameters with preset "deepvoice3_niklm": {
    "n_speakers": 119,
    "speaker_embed_dim": 16,
    "downsample_step": 4,
    "outputs_per_step": 1,
    "embedding_weight_std": 0.1,
    "speaker_embedding_weight_std": 0.05,
    "dropout": 0.050000000000000044,
    "kernel_size": 3,
    "text_embed_dim": 256,
    "encoder_channels": 512,
    "decoder_channels": 256,
    "converter_channels": 256,
    "use_guided_attention": true,
    "guided_attention_sigma": 0.4,
    "binary_divergence_weight": 0.1,
    "use_decoder_state_for_postnet_input": true,
    "max_positions": 3000,
    "query_position_rate": 2.0,
    "key_position_rate": 7.6,
    "key_projection": true,
    "value_projection": true,
    "clip_thresh": 0.1,
    "initial_learning_rate": 0.0005
}

0it [00:00, ?it/s]
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/THCTensorIndex.cu:279: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c line=70 error=59 : device-side assert triggered

Traceback (most recent call last):
  File "train.py", line 967, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 661, in train
    input_lengths=input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 80, in forward
    text_positions, frame_positions, input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch2/deepvoice3_pytorch/__init__.py", line 117, in forward
    print(text_sequences)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 119, in __repr__
    return 'Variable containing:' + self.data.__repr__()
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 133, in __repr__
    return str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 140, in __str__
    return _tensor_str._str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 297, in _str
    strt = _matrix_str(self)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 216, in _matrix_str
    min_sz=5 if not print_full_mat else 0)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/_tensor_str.py", line 79, in _number_format
    tensor = torch.DoubleTensor(tensor.size()).copy_(tensor).abs_().view(tensor.nelement())
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512957107421/work/torch/lib/THC/generic/THCTensorCopy.c:70
r9y9 commented 6 years ago

I guess you are hitting out of range for the speaker embedding table. Can you make sure you have 119 speakers in the dataset? The following command should give 118, but I'm guessing you will get larger value than 118.

cat data/vctk/train.txt | cut -d "|" -f 5 | uniq | awk '{if(m<$1) m=$1} END{print m}'

The next command should give 119.

cat data/vctk/train.txt | cut -d "|" -f 5 | uniq | wc -l
homink commented 6 years ago

Your guessing came true. The speaker indexing in train.txt should be in a form of incremental integer from 0 for working. Thanks!