Running out of GPU memory!

rraallvv commented 6 years ago

The error below is thrown when I try to train wavenet_vocoder with the default parameters on a Jupyter Notebook at Googles's Colaboratory site

!cd ./wavenet_vocoder && python train.py --data-root=data/ljspeech

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Command line args:
 {'--checkpoint': None,
 '--checkpoint-dir': 'checkpoints',
 '--data-root': 'data/ljspeech',
 '--help': False,
 '--hparams': '',
 '--log-event-path': None,
 '--reset-optimizer': False,
 '--restore-parts': None,
 '--speaker-id': None}
Hyperparameters:
  adam_beta1: 0.9
  adam_beta2: 0.999
  adam_eps: 1e-08
  allow_clipping_in_normalization: False
  batch_size: 2
  builder: wavenet
  checkpoint_interval: 10000
  cin_channels: 80
  clip_thresh: -1
  dropout: 0.050000000000000044
  ema_decay: 0.9999
  exponential_moving_average: True
  fft_size: 1024
  fmax: 7600
  fmin: 125
  frame_shift_ms: None
  freq_axis_kernel_size: 3
  gate_channels: 512
  gin_channels: -1
  hop_size: 256
  initial_learning_rate: 0.001
  input_type: raw
  kernel_size: 3
  layers: 24
  log_scale_min: -32.23619130191664
  lr_schedule: noam_learning_rate_decay
  lr_schedule_kwargs: {}
  max_time_sec: None
  max_time_steps: 8000
  min_level_db: -100
  n_speakers: 7
  name: wavenet_vocoder
  nepochs: 2000
  num_mels: 80
  num_workers: 2
  out_channels: 30
  pin_memory: True
  preset: 
  presets: {}
  quantize_channels: 65536
  random_state: 1234
  ref_level_db: 20
  rescaling: True
  rescaling_max: 0.999
  residual_channels: 512
  sample_rate: 22050
  save_optimizer_state: True
  silence_threshold: 2
  skip_out_channels: 256
  stacks: 4
  test_eval_epoch_interval: 5
  test_num_samples: None
  test_size: 0.0441
  train_eval_interval: 10000
  upsample_conditional_features: True
  upsample_scales: [4, 4, 4, 4]
  weight_decay: 0.0
  weight_normalization: True
Local conditioning enabled. Shape of a sample: (426, 80).
[train]: length of the dataset is 12522
Local conditioning enabled. Shape of a sample: (539, 80).
[test]: length of the dataset is 578
WaveNet(
  (first_conv): Conv1d (1, 512, kernel_size=(1,), stride=(1,))
  (conv_layers): ModuleList(
    (0): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (1): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (2): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(4,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (3): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(16,), dilation=(8,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (4): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(32,), dilation=(16,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (5): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(64,), dilation=(32,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (6): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (7): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (8): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(4,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (9): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(16,), dilation=(8,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (10): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(32,), dilation=(16,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (11): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(64,), dilation=(32,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (12): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (13): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (14): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(4,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (15): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(16,), dilation=(8,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (16): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(32,), dilation=(16,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (17): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(64,), dilation=(32,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (18): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (19): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(2,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (20): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(4,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (21): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(16,), dilation=(8,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (22): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(32,), dilation=(16,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
    (23): ResidualConv1dGLU(
      (conv): Conv1d (512, 512, kernel_size=(3,), stride=(1,), padding=(64,), dilation=(32,))
      (conv1x1c): Conv1d (80, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_out): Conv1d (256, 512, kernel_size=(1,), stride=(1,))
      (conv1x1_skip): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    )
  )
  (last_conv_layers): ModuleList(
    (0): ReLU(inplace)
    (1): Conv1d (256, 256, kernel_size=(1,), stride=(1,))
    (2): ReLU(inplace)
    (3): Conv1d (256, 30, kernel_size=(1,), stride=(1,))
  )
  (upsample_conv): ModuleList(
    (0): ConvTranspose2d (1, 1, kernel_size=(3, 4), stride=(1, 4), padding=(1, 0))
    (1): ReLU(inplace)
    (2): ConvTranspose2d (1, 1, kernel_size=(3, 4), stride=(1, 4), padding=(1, 0))
    (3): ReLU(inplace)
    (4): ConvTranspose2d (1, 1, kernel_size=(3, 4), stride=(1, 4), padding=(1, 0))
    (5): ReLU(inplace)
    (6): ConvTranspose2d (1, 1, kernel_size=(3, 4), stride=(1, 4), padding=(1, 0))
    (7): ReLU(inplace)
  )
)
Receptive field (samples / ms): 505 / 22.90249433106576
Los event path: log/run-test2018-02-17_03:59:27.681235
0it [00:00, ?it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory

Traceback (most recent call last):
  File "train.py", line 961, in <module>
    train_loop(model, data_loaders, optimizer, writer, checkpoint_dir=checkpoint_dir)
  File "train.py", line 724, in train_loop
    checkpoint_dir, eval_dir, do_eval, ema)
  File "train.py", line 639, in __train_step
    y_hat = model(x, c=c, g=g, softmax=False)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/wavenet_vocoder/wavenet_vocoder/wavenet.py", line 207, in forward
    x, h = f(x, c, g_bct)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/wavenet_vocoder/wavenet_vocoder/modules.py", line 122, in forward
    return self._forward(x, c, g, False)
  File "/content/wavenet_vocoder/wavenet_vocoder/modules.py", line 140, in _forward
    x = F.dropout(x, p=self.dropout, training=self.training)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 526, in dropout
    return _functions.dropout.Dropout.apply(input, p, training, inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/_functions/dropout.py", line 32, in forward
    output = input.clone()
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

r9y9 commented 6 years ago

Try smaller batch_size or max_time_steps. With default setting you will need ~10 GB GPU memory. I will add a mention for this to README later.

jiqizaisikao commented 6 years ago

Yes,just set batch_size=1 and max_time_steps smaller,i think that you just need 4G GPU memory otherwise it will easily run out of memroy

jiqizaisikao commented 6 years ago

Hi,@r9y9 i have found that when eval model,maybe better setting the requires_grad of the model to be false and donot backwad loss, otherwise the memory use will double and quickly run out of memory

r9y9 commented 6 years ago

I don't think it matters if you have sufficient GPU memory, but yes, ideally requires_grad should be set to False at eval mode.

Ola-Vish commented 6 years ago

Hi, Sorry to re-open this, but I'm running into the same problem.

I'm trying to train the model on the ljspeech corpus, conditioned on mel spectrogram. I followed the instructions, didn't change anything in the presets ljspeech json. I'm training on nvidia P100 GPU, which has 16GB buffer size. I can see it fills up immediately when I start training, and fails quite quickly with out of memory exception. This is the error I get (same as the issue's opener):

`TensorBoard event log path: log/run-test2018-05-07_11:36:28.215518 0it [00:00, ?it/s]THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory

Traceback (most recent call last): File "train.py", line 967, in checkpoint_dir=checkpoint_dir) File "train.py", line 722, in train_loop checkpoint_dir, eval_dir, do_eval, ema) File "train.py", line 640, in train_step y_hat = torch.nn.parallel.data_parallel(model, (x, c, g, False)) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 156, in data_parallel return module(*inputs[0], **module_kwargs[0]) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call__ result = self.forward(*input, kwargs) File "/home/ds72user/wavenet_vocoder/wavenet_vocoder/wavenet.py", line 219, in forward x, h = f(x, c, g_bct) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/ds72user/wavenet_vocoder/wavenet_vocoder/modules.py", line 132, in forward return self._forward(x, c, g, False) File "/home/ds72user/wavenet_vocoder/wavenet_vocoder/modules.py", line 182, in _forward x = _conv1x1_forward(self.conv1x1_out, x, is_incremental) File "/home/ds72user/wavenet_vocoder/wavenet_vocoder/modules.py", line 57, in _conv1x1_forward x = conv(x) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 176, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 973, in device, model, optimizer, global_step, checkpoint_dir, global_epoch) File "train.py", line 751, in save_checkpoint }, checkpoint_path) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/serialization.py", line 161, in save return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol)) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/serialization.py", line 118, in _with_file_like return body(f) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/serialization.py", line 161, in return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol)) File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/serialization.py", line 238, in _save serialized_storages[key]._write_file(f, _is_real_file(f)) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/csrc/generic/serialization.cpp:17 ` It seems really strange to me that this happens, as I have a large memory buffer in my GPU. Could you share on which hardware you trained? I tried reducing the batch_size to 1, making max_time_steps smaller, but it doesn't seem to have any effect on the memory consumption. Any guidance on what to do would be greatly appreciated!

azraelkuan commented 6 years ago

@Ola-Vish because r9y9 has update the repo for pytorch0.4, so i don't test it. but i guess that u may change this line https://github.com/r9y9/wavenet_vocoder/blob/186edcfba993223eefaec6f1a80756a0ab9e8dd3/train.py#L525 to

with torch.no_grad():
     y_hat = model.incremental_forward(

may helps.

Ola-Vish commented 6 years ago

@azraelkuan Thank you for the suggestion but sadly this didn't help :( Still the same problem.

r9y9 commented 6 years ago

oops, I just forgot to add torch.no_grad(). Fixed.

@Ola-Vish Could you check if some examples of https://github.com/pytorch/examples work for you? Do you think the problem is wavenet_vocoder specific?

r9y9 commented 6 years ago

One thing I think I might be doing wrong is that calling module.incremental_forward(x) instead of module(x). I'm not quite familiar with internals of pytorch, but according to the Facebook team it seems the way we are doing now is not recommended. See https://github.com/pytorch/fairseq/commit/50fdf591464ca63940a2c1c5e7057b2f4df034f5#diff-9f76bb3e5dd085949139bba958f8aa3d. That's on my todo list for a while but I haven't done it yet since there's no problem for me until now...

azraelkuan commented 6 years ago

@Ola-Vish i have test the code of the lastest version. i found that the memory will have a small increase during the training process, so i doubt that whether u use all the memory at first. can u show some more details? such as batch_size, used memory , total memory.

r9y9 commented 6 years ago

For the record, I have been training a model for two days with the latest code after merging #58 on Ubuntu 16.04 but haven't seen any issue.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

r9y9 / wavenet_vocoder

Running out of GPU memory! #18