Assertion `srcIndex < srcSelectDimSize` failed

homink commented 6 years ago

Hi again,

I am applying this repository for Korean speech corpus (http://www.korean.go.kr/front/board/boardStandardView.do?board_id=4&mn_id=17&b_seq=464) and have encountered the following error. Could you have a look at it? I will be happy to ask PR once it gets working.

I formatted Korean corpus into npy as same as ljspeech has as single speaker and ran training with single GPU or multipe GPU. But it shows a series of error messages like Assertion srcIndex < srcSelectDimSize failed.

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | head -3
nikl-mel-00001.npy
nikl-mel-00002.npy
nikl-mel-00003.npy
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl | tail -3
nikl-spec-00929.npy
nikl-spec-00930.npy
train.txt
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls data/nikl/*.npy | wc -l
1860

CUDA_VISIBLE_DEVICES=3 python train.py \
  --data-root=./data/nikl/ \
  --hparams="frontend=jp,builder=deepvoice3,preset=deepvoice3_ljspeech" \
  --checkpoint-dir checkpoint_nikl

Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoint_nikl',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './data/nikl/',
 '--help': False,
 '--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
 '--load-embedding': None,
 '--log-event-path': None,
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: True
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: deepvoice3
  checkpoint_interval: 10000
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 256
  eval_interval: 10000
  fft_size: 1024
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: en
  guided_attention_sigma: 0.2
  hop_size: 256
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  key_projection: False
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 512
  min_level_db: -100
  n_speakers: 1
  name: deepvoice3
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  preset: deepvoice3_ljspeech
  presets: {'deepvoice3_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 256, 'enc
oder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input':
 True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}, 'deepvoice3_vctk
': {'n_speakers': 108, 'speaker_embed_dim': 16, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.1, 'speaker_embedding_weight_std': 0.05, 'dropout': 0.050000000000000044, 'kernel_size
': 3, 'text_embed_dim': 256, 'encoder_channels': 512, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.4, 'binary_divergence_weight': 0.1, 'use_
decoder_state_for_postnet_input': True, 'max_positions': 1024, 'query_position_rate': 2.0, 'key_position_rate': 7.6, 'key_projection': True, 'value_projection': True, 'clip_thresh': 0.1, 'initial_learning_
rate': 0.0005}, 'nyanko_ljspeech': {'n_speakers': 1, 'downsample_step': 4, 'outputs_per_step': 1, 'embedding_weight_std': 0.01, 'dropout': 0.050000000000000044, 'kernel_size': 3, 'text_embed_dim': 128, 'en
coder_channels': 256, 'decoder_channels': 256, 'converter_channels': 256, 'use_guided_attention': True, 'guided_attention_sigma': 0.2, 'binary_divergence_weight': 0.1, 'use_decoder_state_for_postnet_input'
: True, 'max_positions': 512, 'query_position_rate': 1.0, 'key_position_rate': 1.385, 'key_projection': False, 'value_projection': False, 'clip_thresh': 0.1, 'initial_learning_rate': 0.0005}}
  priority_freq: 3000
  priority_freq_weight: 0.0
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.01
  text_embed_dim: 128
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: False
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Override hyper parameters with preset "deepvoice3_ljspeech": {
    "n_speakers": 1,
    "downsample_step": 4,
    "outputs_per_step": 1,
    "embedding_weight_std": 0.1,
    "dropout": 0.050000000000000044,
    "kernel_size": 3,
    "text_embed_dim": 256,
    "encoder_channels": 512,
    "decoder_channels": 256,
    "converter_channels": 256,
    "use_guided_attention": true,
    "guided_attention_sigma": 0.2,
    "binary_divergence_weight": 0.1,
    "use_decoder_state_for_postnet_input": true,
    "max_positions": 512,
    "query_position_rate": 1.0,
    "key_position_rate": 1.385,
    "key_projection": true,
    "value_projection": true,
    "clip_thresh": 0.1,
    "initial_learning_rate": 0.0005
}
Los event path: log/run-test2018-01-30_15:05:32.238606
34it [00:08,  4.24it/s]
7it/s]/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, i
nt, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [106,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

...

/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, In
dexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [46,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=59 : device-side assert triggered

Traceback (most recent call last):
  File "train.py", line 941, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 642, in train
    input_lengths=input_lengths)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/__init__.py", line 94, in forward
    linear_outputs = self.postnet(postnet_inputs, speaker_embed)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/deepvoice3_pytorch/deepvoice3.py", line 597, in forward
    return F.sigmoid(x)
  File "/home/kwon/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 817, in sigmoid
    return input.sigmoid()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58

r9y9 commented 6 years ago

I was hoping nobody hit this. The CUDA error message is really hard to tell (you can see more informative error messages if you disable CUDA). I believe the problem is that input or decoder target length exceeded the maximum length in the model.

https://github.com/r9y9/deepvoice3_pytorch/blob/ed38dd204f8fea4ae7df5f5e4a8c64429321c0b0/hparams.py#L137-L139

I added a sanity check to give a better error message when users hit this problem.

Traceback (most recent call last):
  File "train.py", line 947, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 605, in train
    """.format(max_seq_len, hparams.max_positions))
RuntimeError: max_seq_len (186) >= max_posision (64)
Input text or decoder targget length exceeded the maximum length.
Please set a larger value for ``max_position`` in hyper parameters.

Could you try with the latest master if it works?

homink commented 6 years ago

That helps! I will let you know when it comes running.

freds0 commented 5 years ago

@r9y9 , I'm running a training using a Brazilian Portuguese dataset, created by myself, and I had the same problem faced by @homink . I followed yours instructions to run on the CPU, increased the value of the variable max_positions = 4096, but the problem continues. @r9y9, please, do you have any other tips?

freds0 commented 5 years ago

The error message running on CPU:


CUDA_VISIBLE_DEVICES=, python train.py --preset=./presets/deepvoice3_ljspeech.json --data-root=./datasets/processed_AS+JC+LN+RG/  --checkpoint-dir=./checkpoints-22-05
Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': './checkpoints-22-05',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './datasets/processed_AS+JC+LN+RG/',
 '--help': False,
 '--hparams': '',
 '--load-embedding': None,
 '--log-event-path': None,
 '--preset': './presets/deepvoice3_ljspeech.json',
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: True
  amsgrad: False
  batch_size: 16
  binary_divergence_weight: 0.1
  builder: deepvoice3
  checkpoint_interval: 100
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 512
  eval_interval: 100
  fft_size: 1024
  fmax: 8000
  fmin: 0
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: ptbr
  guided_attention_sigma: 0.2
  hop_size: 256
  ignore_recognition_level: 0
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 1.385
  key_projection: True
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 4096
  min_level_db: -100
  min_text: 20
  n_speakers: 4
  name: deepvoice3
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  priority_freq: 3000
  priority_freq_weight: 0.0
  process_only_htk_aligned: False
  query_position_rate: 1.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  rescaling: False
  rescaling_max: 0.999
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.01
  text_embed_dim: 256
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: True
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Los event path: log/run-test2019-05-22_19:21:05.490590
100it [05:29,  3.22s/it]Save intermediate states at step 100
Saved checkpoint: ./checkpoints-22-05/checkpoint_step000000100.pth
Traceback (most recent call last):
  File "train.py", line 981, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 715, in train
    eval_model(global_step, writer, device, model, checkpoint_dir, ismultispeaker)
  File "train.py", line 404, in eval_model
    model_eval, text, p=0, speaker_id=speaker_id, fast=True)
  File "/home/fred/Documentos/deepvoice3_pytorch/synthesis.py", line 62, in tts
    sequence, text_positions=text_positions, speaker_ids=speaker_ids)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fred/Documentos/deepvoice3_pytorch/deepvoice3_pytorch/__init__.py", line 71, in forward
    speaker_embed = self.embed_speakers(speaker_ids)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 117, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 1506, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:193`

freds0 commented 5 years ago

My dataset has 4 speakers:

cat datasets/processed_AS+JC+LN+RG/train.txt | cut -d "|" -f 5 | uniq | awk '{if(m<$1) m=$1} END{print m}'
3

and:

cat datasets/processed_AS+JC+LN+RG/train.txt | cut -d "|" -f 5 | uniq | wc -l
4

r9y9 commented 5 years ago

The preset deepvoice3_ljspeech.json is not for multi-speaker dataset.
You need to adjust n_speakers.
You should use builder=deepvoice3_multispeaker for multi-speaker model.

See https://github.com/r9y9/deepvoice3_pytorch/blob/6fb72bf7b7d53414493f1daeb215f2fb178fba78/presets/deepvoice3_vctk.json#L5-L6 for example.

r9y9 commented 5 years ago

max_positions doesn't seem to be related in your case. That matters if your dataset contains long sentences.

freds0 commented 5 years ago

@r9y9, following your instructions, unfortunately the same problem happened:

CUDA_VISIBLE_DEVICES=-1, python train.py --preset=./presets/deepvoice3_vctk.json --data-root=./datasets/processed_AS+JC+LN+RG/  --checkpoint-dir=./checkpoints-23-5
Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': './checkpoints-23-5',
 '--checkpoint-postnet': None,
 '--checkpoint-seq2seq': None,
 '--data-root': './datasets/processed_AS+JC+LN+RG/',
 '--help': False,
 '--hparams': '',
 '--load-embedding': None,
 '--log-event-path': None,
 '--preset': './presets/deepvoice3_vctk.json',
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None,
 '--train-postnet-only': False,
 '--train-seq2seq-only': False}
Training whole model
Training seq2seq model
Hyperparameters:
  adam_beta1: 0.5
  adam_beta2: 0.9
  adam_eps: 1e-06
  allow_clipping_in_normalization: True
  amsgrad: False
  batch_size: 8
  binary_divergence_weight: 0.1
  builder: deepvoice3_multispeaker
  checkpoint_interval: 10
  clip_thresh: 0.1
  converter_channels: 256
  decoder_channels: 256
  downsample_step: 4
  dropout: 0.050000000000000044
  embedding_weight_std: 0.1
  encoder_channels: 512
  eval_interval: 10
  fft_size: 1024
  fmax: 8000
  fmin: 0
  force_monotonic_attention: True
  freeze_embedding: False
  frontend: en
  guided_attention_sigma: 0.4
  hop_size: 256
  ignore_recognition_level: 0
  initial_learning_rate: 0.0005
  kernel_size: 3
  key_position_rate: 7.6
  key_projection: True
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  masked_loss_weight: 0.5
  max_positions: 1024
  min_level_db: -100
  min_text: 20
  n_speakers: 4
  name: deepvoice3
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  outputs_per_step: 1
  padding_idx: 0
  pin_memory: True
  power: 1.4
  preemphasis: 0.97
  priority_freq: 3000
  priority_freq_weight: 0.0
  process_only_htk_aligned: False
  query_position_rate: 2.0
  ref_level_db: 20
  replace_pronunciation_prob: 0.5
  rescaling: False
  rescaling_max: 0.999
  sample_rate: 22050
  save_optimizer_state: True
  speaker_embed_dim: 16
  speaker_embedding_weight_std: 0.05
  text_embed_dim: 256
  trainable_positional_encodings: False
  use_decoder_state_for_postnet_input: True
  use_guided_attention: True
  use_memory_mask: True
  value_projection: True
  weight_decay: 0.0
  window_ahead: 3
  window_backward: 1
Los event path: log/run-test2019-05-23_10:03:02.916023
10it [00:17,  1.69s/it]Save intermediate states at step 10
Saved checkpoint: ./checkpoints-23-5/checkpoint_step000000010.pth
Traceback (most recent call last):
  File "train.py", line 981, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 715, in train
    eval_model(global_step, writer, device, model, checkpoint_dir, ismultispeaker)
  File "train.py", line 404, in eval_model
    model_eval, text, p=0, speaker_id=speaker_id, fast=True)
  File "/home/fred/Documentos/deepvoice3_pytorch/synthesis.py", line 62, in tts
    sequence, text_positions=text_positions, speaker_ids=speaker_ids)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fred/Documentos/deepvoice3_pytorch/deepvoice3_pytorch/__init__.py", line 71, in forward
    speaker_embed = self.embed_speakers(speaker_ids)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 117, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/opt/anaconda3/envs/deepvoice3_pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 1506, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:193

Can it be some problem in the dataset? I created the alignment.json files inside each speaker folder using the scripts from the repository multi-Speaker-tacotron-tensorflow


python -m datasets.AS.prepare
python -m datasets.JC.prepare
python -m datasets.LN.prepare
python -m datasets.RG.prepare

Then I did a preprocessing

python preprocess.py json_meta "./datasets/AS/alignment.json,./datasets/JC/alignment.json,./datasets/LN/alignment.json,./datasets/RG/alignment.json" "./datasets/processed_AS+JC+LN+RG" --preset=./presets/deepvoice3_vctk.json

Is there a better way to find out what is causing the error? Thanks a lot for the help!

r9y9 commented 5 years ago

Try to figure out what exact value is causing the out of range error.

r9y9 / deepvoice3_pytorch

Assertion `srcIndex < srcSelectDimSize` failed #33