some error in google colab

TatianaZobnina commented 5 years ago

Hi, I am trying to run training process using this line ./scripts/run_train.sh "./exp/my_experiment/" "0" in google colab with gpu and have this error:

musegan.train INFO Using parameters: {'beat_resolution': 12, 'condition_track_idx': None, 'data_shape': [4, 48, 84, 5], 'is_accompaniment': False, 'is_conditional': False, 'latent_dim': 128, 'nets': {'discriminator': 'default', 'generator': 'default'}, 'use_binary_neurons': False} musegan.train INFO Using configurations: {'adam': {'beta1': 0.5, 'beta2': 0.9}, 'batch_size': 64, 'colormap': [[1.0, 0.0, 0.0], [1.0, 0.5, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.5, 1.0]], 'config': './exp/my_experiment//config.yaml', 'data_filename': 'train_x_lpd_5_phr', 'data_source': 'sa', 'eval_dir': '/content/musegan/exp/my_experiment/eval', 'evaluate_steps': 100, 'exp_dir': '/content/musegan/exp/my_experiment', 'gan_loss_type': 'wasserstein', 'gpu': '0', 'initial_learning_rate': 0.001, 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000}, 'log_dir': '/content/musegan/exp/my_experiment/logs/train', 'log_loss_steps': 100, 'midi': {'is_drums': [1, 0, 0, 0, 0], 'lowest_pitch': 24, 'programs': [0, 0, 25, 33, 48], 'tempo': 100}, 'model_dir': '/content/musegan/exp/my_experiment/model', 'n_dis_updates_per_gen_update': 5, 'n_jobs': 20, 'params': './exp/my_experiment//params.yaml', 'sample_dir': '/content/musegan/exp/my_experiment/samples', 'sample_grid': [8, 8], 'save_array_samples': True, 'save_checkpoint_steps': 10000, 'save_image_samples': True, 'save_pianoroll_samples': True, 'save_samples_steps': 100, 'save_summaries_steps': 0, 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000}, 'src_dir': '/content/musegan/exp/my_experiment/src', 'steps': 50000, 'use_gradient_penalties': True, 'use_learning_rate_decay': True, 'use_random_transpose': False, 'use_slope_annealing': False, 'use_train_test_split': False} musegan.train INFO Loading training data. musegan.train INFO Training data size: 102378 musegan.train INFO Building dataset. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape.

musegan.model INFO Building model. musegan.model INFO Building training nodes. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:16: conv3d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d_transpose instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:21: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:12: conv3d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:8: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. musegan.model INFO Building losses. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. musegan.model INFO Building training ops. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. musegan.model INFO Building summaries. musegan.train INFO Number of trainable parameters in Model: 3,943,968 musegan.train INFO Number of trainable parameters in Generator: 2,578,127 musegan.train INFO Number of trainable parameters in Discriminator: 1,365,841 musegan.train INFO Loading sample_z. musegan.model INFO Building prediction nodes. musegan.train INFO Training start. tensorflow INFO Create CheckpointSaverHook. tensorflow INFO Graph was finalized. 2019-06-09 19:03:56.671015: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-09 19:03:56.852445: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-09 19:03:56.853229: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x10943340 executing computations on platform CUDA. Devices: 2019-06-09 19:03:56.853277: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2019-06-09 19:03:56.855500: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-06-09 19:03:56.855683: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x109431e0 executing computations on platform Host. Devices: 2019-06-09 19:03:56.855721: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-09 19:03:56.856041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:00:04.0 totalMemory: 14.73GiB freeMemory: 14.60GiB 2019-06-09 19:03:56.856089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-09 19:03:56.856787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-09 19:03:56.856809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-06-09 19:03:56.856819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-06-09 19:03:56.857079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14202 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5) tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. tensorflow INFO Restoring parameters from /content/musegan/exp/my_experiment/model/model.ckpt-0 tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. tensorflow INFO Running local_init_op. tensorflow INFO Done running local_init_op. tensorflow INFO Saving checkpoints for 0 into /content/musegan/exp/my_experiment/model/model.ckpt. 2019-06-09 19:04:11.607825: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally musegan.train INFO step=100, gen_loss=-1.2372E+02, dis_loss=-1.9598E+02 musegan.train INFO Running sampler musegan.train INFO Running evaluation ./scripts/run_train.sh: line 19: 879 Bus error (core dumped) python3 "$DIR/../src/train.py" --exp_dir "$1" --params "$1/params.yaml" --config "$1/config.yaml" --gpu "$gpu"

salu133445 commented 5 years ago

It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting evaluate_steps to zero in the configuration file (config.py in the experiment directory).

TatianaZobnina commented 5 years ago

It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting evaluate_steps to zero in the configuration file (config.py in the experiment directory).

Thank you for replay, I tried to change 'evaluate_steps' to 0, but still have the same error. My guess that it's related to GPUs configuration in google colab, continue to work on this error

salu133445 commented 5 years ago

Have you succeeded to implement the model on Google colab? It would be really nice to see that!

TatianaZobnina commented 5 years ago

Have you succeeded to implement the model on Google colab? It would be really nice to see that!

not yet, unfortunately, but continue my work on it. Also I have some question about your net. Could you explain in few words what is the inference part and what is the interpolation part?

salu133445 commented 5 years ago

I see. Do let me know if there is anything I can help.

The inference script generates a batch of samples with random noise inputs drawn from the latent distribution (the one used during the training). In other words, they are just random samples.

The interpolation script generates a batch of samples as well, but the noise inputs are drawn as a grid from the latent distribution. In this way, you can see how the generated samples change gradually as the inputs change.

TatianaZobnina commented 5 years ago

Thank you for explanation! Continue my work with your net, there is not much option to generate multi-track midi with net

salu133445 / musegan

some error in google colab #79