salu133445 / musegan

An AI for Music Generation
https://salu133445.github.io/musegan/
MIT License
1.81k stars 369 forks source link

some error in google colab #79

Closed TatianaZobnina closed 4 years ago

TatianaZobnina commented 5 years ago

Hi, I am trying to run training process using this line ./scripts/run_train.sh "./exp/my_experiment/" "0" in google colab with gpu and have this error:

musegan.train INFO Using parameters: {'beat_resolution': 12, 'condition_track_idx': None, 'data_shape': [4, 48, 84, 5], 'is_accompaniment': False, 'is_conditional': False, 'latent_dim': 128, 'nets': {'discriminator': 'default', 'generator': 'default'}, 'use_binary_neurons': False} musegan.train INFO Using configurations: {'adam': {'beta1': 0.5, 'beta2': 0.9}, 'batch_size': 64, 'colormap': [[1.0, 0.0, 0.0], [1.0, 0.5, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.5, 1.0]], 'config': './exp/my_experiment//config.yaml', 'data_filename': 'train_x_lpd_5_phr', 'data_source': 'sa', 'eval_dir': '/content/musegan/exp/my_experiment/eval', 'evaluate_steps': 100, 'exp_dir': '/content/musegan/exp/my_experiment', 'gan_loss_type': 'wasserstein', 'gpu': '0', 'initial_learning_rate': 0.001, 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000}, 'log_dir': '/content/musegan/exp/my_experiment/logs/train', 'log_loss_steps': 100, 'midi': {'is_drums': [1, 0, 0, 0, 0], 'lowest_pitch': 24, 'programs': [0, 0, 25, 33, 48], 'tempo': 100}, 'model_dir': '/content/musegan/exp/my_experiment/model', 'n_dis_updates_per_gen_update': 5, 'n_jobs': 20, 'params': './exp/my_experiment//params.yaml', 'sample_dir': '/content/musegan/exp/my_experiment/samples', 'sample_grid': [8, 8], 'save_array_samples': True, 'save_checkpoint_steps': 10000, 'save_image_samples': True, 'save_pianoroll_samples': True, 'save_samples_steps': 100, 'save_summaries_steps': 0, 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000}, 'src_dir': '/content/musegan/exp/my_experiment/src', 'steps': 50000, 'use_gradient_penalties': True, 'use_learning_rate_decay': True, 'use_random_transpose': False, 'use_slope_annealing': False, 'use_train_test_split': False} musegan.train INFO Loading training data. musegan.train INFO Training data size: 102378 musegan.train INFO Building dataset. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape.

musegan.model INFO Building model. musegan.model INFO Building training nodes. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:16: conv3d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d_transpose instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:21: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:12: conv3d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:8: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. musegan.model INFO Building losses. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. musegan.model INFO Building training ops. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. musegan.model INFO Building summaries. musegan.train INFO Number of trainable parameters in Model: 3,943,968 musegan.train INFO Number of trainable parameters in Generator: 2,578,127 musegan.train INFO Number of trainable parameters in Discriminator: 1,365,841 musegan.train INFO Loading sample_z. musegan.model INFO Building prediction nodes. musegan.train INFO Training start. tensorflow INFO Create CheckpointSaverHook. tensorflow INFO Graph was finalized. 2019-06-09 19:03:56.671015: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-09 19:03:56.852445: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-09 19:03:56.853229: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x10943340 executing computations on platform CUDA. Devices: 2019-06-09 19:03:56.853277: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2019-06-09 19:03:56.855500: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-06-09 19:03:56.855683: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x109431e0 executing computations on platform Host. Devices: 2019-06-09 19:03:56.855721: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-09 19:03:56.856041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:00:04.0 totalMemory: 14.73GiB freeMemory: 14.60GiB 2019-06-09 19:03:56.856089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-09 19:03:56.856787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-09 19:03:56.856809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-06-09 19:03:56.856819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-06-09 19:03:56.857079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14202 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5) tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. tensorflow INFO Restoring parameters from /content/musegan/exp/my_experiment/model/model.ckpt-0 tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. tensorflow INFO Running local_init_op. tensorflow INFO Done running local_init_op. tensorflow INFO Saving checkpoints for 0 into /content/musegan/exp/my_experiment/model/model.ckpt. 2019-06-09 19:04:11.607825: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally musegan.train INFO step=100, gen_loss=-1.2372E+02, dis_loss=-1.9598E+02 musegan.train INFO Running sampler musegan.train INFO Running evaluation ./scripts/run_train.sh: line 19: 879 Bus error (core dumped) python3 "$DIR/../src/train.py" --exp_dir "$1" --params "$1/params.yaml" --config "$1/config.yaml" --gpu "$gpu"

salu133445 commented 5 years ago

It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting evaluate_steps to zero in the configuration file (config.py in the experiment directory).

TatianaZobnina commented 5 years ago

It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting evaluate_steps to zero in the configuration file (config.py in the experiment directory).

Thank you for replay, I tried to change 'evaluate_steps' to 0, but still have the same error. My guess that it's related to GPUs configuration in google colab, continue to work on this error

salu133445 commented 5 years ago

Have you succeeded to implement the model on Google colab? It would be really nice to see that!

TatianaZobnina commented 5 years ago

Have you succeeded to implement the model on Google colab? It would be really nice to see that!

not yet, unfortunately, but continue my work on it. Also I have some question about your net. Could you explain in few words what is the inference part and what is the interpolation part?

salu133445 commented 5 years ago

I see. Do let me know if there is anything I can help.

The inference script generates a batch of samples with random noise inputs drawn from the latent distribution (the one used during the training). In other words, they are just random samples. image

The interpolation script generates a batch of samples as well, but the noise inputs are drawn as a grid from the latent distribution. In this way, you can see how the generated samples change gradually as the inputs change. image

TatianaZobnina commented 5 years ago

Thank you for explanation! Continue my work with your net, there is not much option to generate multi-track midi with net