salu133445 / musegan

An AI for Music Generation
https://salu133445.github.io/musegan/
MIT License
1.81k stars 369 forks source link

Bus Error : totalMemory: 11.17GiB freeMemory: 11.10GiB #46

Closed aaronrmm closed 5 years ago

aaronrmm commented 5 years ago

I'm getting a memory bus error on trying to load in train_x_lpd_5_phr.npz, even when attempting to load a pretrained model.

musegan.interpolation INFO     Using parameters:
{'beat_resolution': 12,
 'condition_track_idx': 3,
 'data_shape': [4, 48, 84, 5],
 'is_accompaniment': True,
 'is_conditional': False,
 'latent_dim': 128,
 'nets': {'discriminator': 'default', 'generator': 'accompaniment'},
 'use_binary_neurons': False}
musegan.interpolation INFO     Using configurations:
{'adam': {'beta1': 0.5, 'beta2': 0.9},
 'batch_size': 64,
 'checkpoint_dir': './musegan/exp/accompaniment/bass/model',
 'colormap': [[1.0, 0.0, 0.0],
              [1.0, 0.5, 0.0],
              [0.0, 1.0, 0.0],
              [0.0, 0.0, 1.0],
              [0.0, 0.5, 1.0]],
 'columns': 5,
 'config': './musegan/exp/accompaniment/bass/config.yaml',
 'data_filename': 'train_x_lpd_5_phr',
 'data_root': None,
 'data_source': 'sa',
 'evaluate_steps': 100,
 'gan_loss_type': 'wasserstein',
 'gpu': '0',
 'initial_learning_rate': 0.001,
 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000},
 'log_loss_steps': 100,
 'lower': 0.0,
 'midi': {'is_drums': [1, 0, 0, 0, 0],
          'lowest_pitch': 24,
          'programs': [0, 0, 25, 33, 48],
          'tempo': 100},
 'mode': 'lerp',
 'n_dis_updates_per_gen_update': 5,
 'n_jobs': 20,
 'params': './musegan/exp/accompaniment/bass/params.yaml',
 'result_dir': './musegan/exp/accompaniment/bass/results/interpolation',
 'rows': 5,
 'runs': 10,
 'sample_grid': [8, 8],
 'save_array_samples': True,
 'save_checkpoint_steps': 10000,
 'save_image_samples': True,
 'save_pianoroll_samples': True,
 'save_samples_steps': 100,
 'save_summaries_steps': 0,
 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000},
 'steps': 50000,
 'upper': 1.0,
 'use_gradient_penalties': True,
 'use_learning_rate_decay': True,
 'use_random_transpose': False,
 'use_slope_annealing': False,
 'use_train_test_split': False}
musegan.model        INFO     Building model.
musegan.model        INFO     Building training nodes.
musegan.model        INFO     Building losses.
musegan.model        INFO     Building training ops.
musegan.model        INFO     Building summaries.
musegan.model        INFO     Building prediction nodes.
2018-11-05 22:08:23.124707: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-05 22:08:23.125223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-11-05 22:08:23.125265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-05 22:08:23.549934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-05 22:08:23.550032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-05 22:08:23.550059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-05 22:08:23.550423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10758 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
musegan.interpolation INFO     Restoring the latest checkpoint.
INFO:tensorflow:Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
tensorflow           INFO     Restoring parameters from /content/musegan/exp/accompaniment/bass/model/model.ckpt-300450
./musegan/scripts/run_interpolation.sh: line 24:   694 Bus error               (core dumped) python3 "$DIR/../src/interpolation.py" --checkpoint_dir "$1/model" --result_dir "$1/results/interpolation" --params "$1/params.yaml" --config "$1/config.yaml" --lower 0.0 --upper 1.0 --runs 10 --gpu "$gpu"
salu133445 commented 5 years ago

I guess they are the same problems since the training data will also be loaded at the inference stage for the accompaniment model. You need to have >5G RAM for loading the entire training data. One solution is to load only part of the training data if you don't have enough RAM. You can modify the following function to achieve this (by setting the length of the first axis to a smaller number).

def load_data_from_npz(filename):
    """Load and return the training data from a npz file (sparse format)."""
    with np.load(filename) as f:
        data = np.zeros(f['shape'], np.bool_)
        data[[x for x in f['nonzero']]] = True
    return data