Closed TatianaZobnina closed 4 years ago
It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting evaluate_steps
to zero in the configuration file (config.py
in the experiment directory).
It seems that the error occurs in the evaluation process. But I am not sure what's happening given only the provided information. Note that you can disable the evaluation by setting
evaluate_steps
to zero in the configuration file (config.py
in the experiment directory).
Thank you for replay, I tried to change 'evaluate_steps' to 0, but still have the same error. My guess that it's related to GPUs configuration in google colab, continue to work on this error
Have you succeeded to implement the model on Google colab? It would be really nice to see that!
Have you succeeded to implement the model on Google colab? It would be really nice to see that!
not yet, unfortunately, but continue my work on it. Also I have some question about your net. Could you explain in few words what is the inference part and what is the interpolation part?
I see. Do let me know if there is anything I can help.
The inference script generates a batch of samples with random noise inputs drawn from the latent distribution (the one used during the training). In other words, they are just random samples.
The interpolation script generates a batch of samples as well, but the noise inputs are drawn as a grid from the latent distribution. In this way, you can see how the generated samples change gradually as the inputs change.
Thank you for explanation! Continue my work with your net, there is not much option to generate multi-track midi with net
Hi, I am trying to run training process using this line ./scripts/run_train.sh "./exp/my_experiment/" "0" in google colab with gpu and have this error:
musegan.train INFO Using parameters: {'beat_resolution': 12, 'condition_track_idx': None, 'data_shape': [4, 48, 84, 5], 'is_accompaniment': False, 'is_conditional': False, 'latent_dim': 128, 'nets': {'discriminator': 'default', 'generator': 'default'}, 'use_binary_neurons': False} musegan.train INFO Using configurations: {'adam': {'beta1': 0.5, 'beta2': 0.9}, 'batch_size': 64, 'colormap': [[1.0, 0.0, 0.0], [1.0, 0.5, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.5, 1.0]], 'config': './exp/my_experiment//config.yaml', 'data_filename': 'train_x_lpd_5_phr', 'data_source': 'sa', 'eval_dir': '/content/musegan/exp/my_experiment/eval', 'evaluate_steps': 100, 'exp_dir': '/content/musegan/exp/my_experiment', 'gan_loss_type': 'wasserstein', 'gpu': '0', 'initial_learning_rate': 0.001, 'learning_rate_schedule': {'end': 50000, 'end_value': 0.0, 'start': 45000}, 'log_dir': '/content/musegan/exp/my_experiment/logs/train', 'log_loss_steps': 100, 'midi': {'is_drums': [1, 0, 0, 0, 0], 'lowest_pitch': 24, 'programs': [0, 0, 25, 33, 48], 'tempo': 100}, 'model_dir': '/content/musegan/exp/my_experiment/model', 'n_dis_updates_per_gen_update': 5, 'n_jobs': 20, 'params': './exp/my_experiment//params.yaml', 'sample_dir': '/content/musegan/exp/my_experiment/samples', 'sample_grid': [8, 8], 'save_array_samples': True, 'save_checkpoint_steps': 10000, 'save_image_samples': True, 'save_pianoroll_samples': True, 'save_samples_steps': 100, 'save_summaries_steps': 0, 'slope_schedule': {'end': 50000, 'end_value': 5.0, 'start': 10000}, 'src_dir': '/content/musegan/exp/my_experiment/src', 'steps': 50000, 'use_gradient_penalties': True, 'use_learning_rate_decay': True, 'use_random_transpose': False, 'use_slope_annealing': False, 'use_train_test_split': False} musegan.train INFO Loading training data. musegan.train INFO Training data size: 102378 musegan.train INFO Building dataset. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means
tf.py_function
s can use accelerators such as GPUs as well as being differentiable using a gradient tape.musegan.model INFO Building model. musegan.model INFO Building training nodes. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:16: conv3d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d_transpose instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:21: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:12: conv3d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv3d instead. tensorflow WARNING From /content/musegan/src/musegan/presets/ops.py:8: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. musegan.model INFO Building losses. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. musegan.model INFO Building training ops. tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. musegan.model INFO Building summaries. musegan.train INFO Number of trainable parameters in Model: 3,943,968 musegan.train INFO Number of trainable parameters in Generator: 2,578,127 musegan.train INFO Number of trainable parameters in Discriminator: 1,365,841 musegan.train INFO Loading sample_z. musegan.model INFO Building prediction nodes. musegan.train INFO Training start. tensorflow INFO Create CheckpointSaverHook. tensorflow INFO Graph was finalized. 2019-06-09 19:03:56.671015: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-09 19:03:56.852445: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-09 19:03:56.853229: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x10943340 executing computations on platform CUDA. Devices: 2019-06-09 19:03:56.853277: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2019-06-09 19:03:56.855500: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-06-09 19:03:56.855683: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x109431e0 executing computations on platform Host. Devices: 2019-06-09 19:03:56.855721: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):,
2019-06-09 19:03:56.856041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-06-09 19:03:56.856089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-09 19:03:56.856787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-09 19:03:56.856809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-09 19:03:56.856819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-09 19:03:56.857079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14202 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
tensorflow INFO Restoring parameters from /content/musegan/exp/my_experiment/model/model.ckpt-0
tensorflow WARNING From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
tensorflow INFO Running local_init_op.
tensorflow INFO Done running local_init_op.
tensorflow INFO Saving checkpoints for 0 into /content/musegan/exp/my_experiment/model/model.ckpt.
2019-06-09 19:04:11.607825: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
musegan.train INFO step=100, gen_loss=-1.2372E+02, dis_loss=-1.9598E+02
musegan.train INFO Running sampler
musegan.train INFO Running evaluation
./scripts/run_train.sh: line 19: 879 Bus error (core dumped) python3 "$DIR/../src/train.py" --exp_dir "$1" --params "$1/params.yaml" --config "$1/config.yaml" --gpu "$gpu"