magenta / ddsp-vst

Realtime DDSP Neural Synthesizer and Effect
Apache License 2.0
702 stars 68 forks source link

[BUG] Colab failing/zipping before it begins training #28

Closed cbmtrx closed 1 year ago

cbmtrx commented 1 year ago

Describe the bug The Colab encounters an error then proceeds to zip the results. Training phase never starts. I had this working 2 days ago (with a much shorter audio file) but repeated attempts now fail (12 minute mp3 file/31Mb).

System Info

To Reproduce Start Colab, click run, OK G Drive, select folder with audio, chugs awhile, then fails...

Screenshots See copied output below (don't know if that helps).

Additional context

Output: Installing DDSP... This should take about 2 minutes... Copying audio to colab for training... Copying /content/gdrive/MyDrive/Train_VST/Monochord short.mp3 to audio/Monochord_short.mp3 Preparing new dataset from audio/

Creating dataset... This usually takes around 2-3 minutes for each minute of audio (10 minutes of training audio -> 20-30 minutes)

Training... I0929 00:21:29.349751 140639050278784 ddsp_run.py:179] Restore Dir: /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020 I0929 00:21:29.350117 140639050278784 ddsp_run.py:180] Save Dir: /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020 I0929 00:21:29.350983 140639050278784 resource_reader.py:50] system_path_file_exists:optimization/base.gin E0929 00:21:29.351246 140639050278784 resource_reader.py:55] Path not found: optimization/base.gin I0929 00:21:29.352814 140639050278784 resource_reader.py:50] system_path_file_exists:eval/basic.gin E0929 00:21:29.353032 140639050278784 resource_reader.py:55] Path not found: eval/basic.gin I0929 00:21:29.355939 140639050278784 ddsp_run.py:152] Operative config not found in /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020 I0929 00:21:29.356218 140639050278784 resource_reader.py:50] system_path_file_exists:models/vst/vst.gin E0929 00:21:29.356484 140639050278784 resource_reader.py:55] Path not found: models/vst/vst.gin I0929 00:21:29.362102 140639050278784 resource_reader.py:50] system_path_file_exists:datasets/tfrecord.gin E0929 00:21:29.362300 140639050278784 resource_reader.py:55] Path not found: datasets/tfrecord.gin I0929 00:21:29.362647 140639050278784 resource_reader.py:50] system_path_file_exists:datasets/base.gin E0929 00:21:29.362849 140639050278784 resource_reader.py:55] Path not found: datasets/base.gin I0929 00:21:29.370896 140639050278784 ddsp_run.py:184] Operative Gin Config: import ddsp import ddsp.training as ddsp2

Macros:

==============================================================================

batch_size = 16 evaluators = [@BasicEvaluator] frame_rate = 50 frame_size = 1024 learning_rate = 0.0003 n_samples = 64320 sample_rate = 16000

Parameters for processors.Add:

==============================================================================

processors.Add.name = 'add'

Parameters for Autoencoder:

==============================================================================

Autoencoder.decoder = @decoders.RnnFcDecoder() Autoencoder.encoder = None Autoencoder.losses = [@losses.SpectralLoss()] Autoencoder.preprocessor = @preprocessing.OnlineF0PowerPreprocessor() Autoencoder.processor_group = @processors.ProcessorGroup()

Parameters for Crop:

==============================================================================

Crop.crop_location = 'back' Crop.frame_size = 320

Parameters for evaluate:

==============================================================================

evaluate.batch_size = 32 evaluate.data_provider = @data.TFRecordProvider() evaluate.evaluator_classes = %evaluators evaluate.num_batches = 5

Parameters for FilteredNoise:

==============================================================================

FilteredNoise.n_samples = %n_samples FilteredNoise.name = 'filtered_noise' FilteredNoise.scale_fn = @core.exp_sigmoid FilteredNoise.window_size = 0

Parameters for FilteredNoiseReverb:

==============================================================================

FilteredNoiseReverb.n_filter_banks = 32 FilteredNoiseReverb.n_frames = 500 FilteredNoiseReverb.name = 'reverb' FilteredNoiseReverb.reverb_length = 24000 FilteredNoiseReverb.trainable = True

Parameters for get_model:

==============================================================================

get_model.model = @models.Autoencoder()

Parameters for Harmonic:

==============================================================================

Harmonic.amp_resample_method = 'linear' Harmonic.n_samples = %n_samples Harmonic.name = 'harmonic' Harmonic.normalize_below_nyquist = True Harmonic.sample_rate = %sample_rate Harmonic.scale_fn = @core.exp_sigmoid

Parameters for OnlineF0PowerPreprocessor:

==============================================================================

OnlineF0PowerPreprocessor.compute_f0 = False OnlineF0PowerPreprocessor.compute_power = True OnlineF0PowerPreprocessor.crepe_saved_model_path = None OnlineF0PowerPreprocessor.frame_rate = %frame_rate OnlineF0PowerPreprocessor.frame_size = %frame_size OnlineF0PowerPreprocessor.padding = 'center'

Parameters for ProcessorGroup:

==============================================================================

ProcessorGroup.dag = \ [(@synths.Harmonic(), ['amps', 'harmonic_distribution', 'f0_hz']), (@synths.FilteredNoise(), ['noise_magnitudes']), (@processors.Add(), ['filtered_noise/signal', 'harmonic/signal']), (@effects.FilteredNoiseReverb(), ['add/signal']), (@processors.Crop(), ['reverb/signal'])]

Parameters for RnnFcDecoder:

==============================================================================

RnnFcDecoder.ch = 256 RnnFcDecoder.input_keys = ('pw_scaled', 'f0_scaled') RnnFcDecoder.layers_per_stack = 1 RnnFcDecoder.output_splits = \ (('amps', 1), ('harmonic_distribution', 60), ('noise_magnitudes', 65)) RnnFcDecoder.rnn_channels = 512 RnnFcDecoder.rnn_type = 'gru'

Parameters for sample:

==============================================================================

sample.batch_size = 16 sample.ckpt_delay_secs = 300 sample.data_provider = @data.TFRecordProvider() sample.evaluator_classes = %evaluators sample.num_batches = 1

Parameters for SpectralLoss:

==============================================================================

SpectralLoss.logmag_weight = 1.0 SpectralLoss.loss_type = 'L1' SpectralLoss.mag_weight = 1.0

Parameters for TFRecordProvider:

==============================================================================

TFRecordProvider.centered = True TFRecordProvider.file_pattern = 'data/train.tfrecord*' TFRecordProvider.frame_rate = 50

Parameters for train:

==============================================================================

train.batch_size = %batch_size train.data_provider = @data.TFRecordProvider() train.num_steps = 30000 train.steps_per_save = 300 train.steps_per_summary = 300

Parameters for Trainer:

==============================================================================

Trainer.checkpoints_to_keep = 3 Trainer.grad_clip_norm = 3.0 Trainer.learning_rate = %learning_rate Trainer.lr_decay_rate = 0.98 Trainer.lr_decay_steps = 10000

I0929 00:21:29.371028 140639050278784 train_util.py:76] Defaulting to MirroredStrategy 2022-09-29 00:21:30.042095: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',) I0929 00:21:30.047340 140639050278784 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',) Traceback (most recent call last): File "/usr/local/bin/ddsp_run", line 8, in sys.exit(console_entry_point()) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/ddsp_run.py", line 227, in console_entry_point app.run(main) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/ddsp_run.py", line 202, in main report_loss_to_hypertune=FLAGS.hypertune) File "/usr/local/lib/python3.7/dist-packages/gin/config.py", line 1605, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/usr/local/lib/python3.7/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.traceback) from None File "/usr/local/lib/python3.7/dist-packages/gin/config.py", line 1582, in gin_wrapper return fn(*new_args, *new_kwargs) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/train_util.py", line 242, in train dataset = data_provider.get_batch(batch_size, shuffle=True, repeats=-1) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/data.py", line 74, in get_batch dataset = self.get_dataset(shuffle) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/data.py", line 248, in get_dataset filenames = tf.data.Dataset.list_files(self._file_pattern, shuffle=shuffle) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1349, in list_files condition, [message], summarize=1, name="assert_not_empty") File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 161, in Assert (condition, "\n".join(data_str))) tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'No files matched pattern: data/train.tfrecord' In call to configurable 'train' (<function train at 0x7fe885577b00>)

Exporting model... Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/ddsp/training/train_util.py", line 165, in get_latest_operative_config restore_dir, prefix='operative_config-', suffix='.gin') File "/usr/local/lib/python3.7/dist-packages/ddsp/training/train_util.py", line 106, in get_latest_file f'No files found matching the pattern \'{search_pattern}\'.') FileNotFoundError: No files found matching the pattern '/content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020/operative_config-*.gin'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/ddsp_export", line 8, in sys.exit(console_entry_point()) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/ddsp_export.py", line 364, in console_entry_point app.run(main) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/ddsp_export.py", line 333, in main export_impulse_response(model_path, save_dir, FLAGS.reverb_sample_rate) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/ddsp_export.py", line 272, in export_impulse_response ddsp.training.inference.parse_operative_config(model_path) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/inference.py", line 41, in parse_operative_config operative_config = train_util.get_latest_operative_config(ckpt_dir) File "/usr/local/lib/python3.7/dist-packages/ddsp/training/train_util.py", line 168, in get_latest_operative_config os.path.dirname(restore_dir), prefix='operative_config-', suffix='.gin') File "/usr/local/lib/python3.7/dist-packages/ddsp/training/train_util.py", line 106, in get_latest_file f'No files found matching the pattern \'{search_pattern}\'.') FileNotFoundError: No files found matching the pattern '/content/gdrive/MyDrive/Train_VST/operative_config-*.gin'. Export complete! Zipping /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020/My_Instrument to /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020/My_Instrument.zip adding: My_Instrument/ (stored 0%) Zipping Complete! Downloading... My_Instrument.zip You can also find your model at /content/gdrive/MyDrive/Train_VST/ddsp-training-2022-09-29-0020/My_Instrument

ColtonOsterlund commented 1 year ago

I'm receiving the same bug - I've been able to determine that it occurs because the dataset creation step fails, thus there are no tfrecords to train the model (hence the training seemingly being skipped).

The error happening during the dataset creation is: "ValueError: Size of each quantile should be size of p: received 1, but expected 360."

To note, this happens both when running training within the notebook as well as when running training locally. It seems to be coming from the hmmlearn package, with the logs stating that hmm.py has undergone major changes with the line: "MultinomialHMM has undergone major changes. The previous version was implementing a CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution"

Also to note, the last release for the hmmlearn package on PyPi was on September 26th, 2022 - lining up with when you said it last worked 2 days ago. I'll test to see if downgrading the hmmlearn package fixes things for now.

Logs for the failing dataset creation:

Creating dataset...
This usually takes around 2-3 minutes for each minute of audio
(10 minutes of training audio -> 20-30 minutes)
I0929 05:04:12.706241 139725471348608 environments.py:376] Default Python SDK image for environment is apache/beam_python3.7_sdk:2.41.0
I0929 05:04:12.804227 139725471348608 translations.py:714] ==================== <function annotate_downstream_side_inputs at 0x7f1351b54e60> ====================
I0929 05:04:12.804770 139725471348608 translations.py:714] ==================== <function fix_side_input_pcoll_coders at 0x7f1351b54f80> ====================
I0929 05:04:12.805119 139725471348608 translations.py:714] ==================== <function pack_combiners at 0x7f1351b584d0> ====================
I0929 05:04:12.805743 139725471348608 translations.py:714] ==================== <function lift_combiners at 0x7f1351b58560> ====================
I0929 05:04:12.805921 139725471348608 translations.py:714] ==================== <function expand_sdf at 0x7f1351b58710> ====================
I0929 05:04:12.806161 139725471348608 translations.py:714] ==================== <function expand_gbk at 0x7f1351b587a0> ====================
I0929 05:04:12.806630 139725471348608 translations.py:714] ==================== <function sink_flattens at 0x7f1351b588c0> ====================
I0929 05:04:12.806832 139725471348608 translations.py:714] ==================== <function greedily_fuse at 0x7f1351b58950> ====================
I0929 05:04:12.808167 139725471348608 translations.py:714] ==================== <function read_to_impulse at 0x7f1351b589e0> ====================
I0929 05:04:12.808335 139725471348608 translations.py:714] ==================== <function impulse_to_input at 0x7f1351b58a70> ====================
I0929 05:04:12.808505 139725471348608 translations.py:714] ==================== <function sort_stages at 0x7f1351b58cb0> ====================
I0929 05:04:12.808886 139725471348608 translations.py:714] ==================== <function add_impulse_to_dangling_transforms at 0x7f1351b58dd0> ====================
I0929 05:04:12.809025 139725471348608 translations.py:714] ==================== <function setup_timer_mapping at 0x7f1351b58c20> ====================
I0929 05:04:12.809275 139725471348608 translations.py:714] ==================== <function populate_data_channel_coders at 0x7f1351b58d40> ====================
I0929 05:04:12.812187 139725471348608 statecache.py:172] Creating state cache with size 100
I0929 05:04:12.812885 139725471348608 worker_handlers.py:908] Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x7f1351a89750> for environment ref_Environment_default_environment_1 (beam:env:embedded_python:v1, b'')
I0929 05:04:12.851229 139725471348608 prepare_tfrecord_lib.py:58] Loading 'audio/Piano.wav'.
2022-09-29 05:04:22.884120: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
W0929 05:04:44.607767 139725471348608 hmm.py:402] MultinomialHMM has undergone major changes. The previous version was implementing a CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution (e.g. as in https://en.wikipedia.org/wiki/Multinomial_distribution). See these issues for details:
https://github.com/hmmlearn/hmmlearn/issues/335
https://github.com/hmmlearn/hmmlearn/issues/340
Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 837, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 983, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.7/dist-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
    wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)]
  File "/usr/local/lib/python3.7/dist-packages/ddsp/training/data_preparation/prepare_tfrecord_lib.py", line 89, in _add_f0_estimate
    audio, frame_rate, viterbi=viterbi, padding=padding)
  File "/usr/local/lib/python3.7/dist-packages/ddsp/spectral_ops.py", line 357, in compute_f0
    verbose=0)
  File "/usr/local/lib/python3.7/dist-packages/crepe/core.py", line 261, in predict
    cents = to_viterbi_cents(activation)
  File "/usr/local/lib/python3.7/dist-packages/crepe/core.py", line 150, in to_viterbi_cents
    path = model.predict(observations.reshape(-1, 1), [len(observations)])
  File "/usr/local/lib/python3.7/dist-packages/hmmlearn/base.py", line 396, in predict
    _, state_sequence = self.decode(X, lengths)
  File "/usr/local/lib/python3.7/dist-packages/hmmlearn/base.py", line 373, in decode
    sub_log_prob, sub_state_sequence = decoder(sub_X)
  File "/usr/local/lib/python3.7/dist-packages/hmmlearn/base.py", line 318, in _decode_viterbi
    log_frameprob = self._compute_log_likelihood(X)
  File "/usr/local/lib/python3.7/dist-packages/hmmlearn/hmm.py", line 471, in _compute_log_likelihood
    X, n=self.n_trials, p=self.emissionprob_[component, :])
  File "/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py", line 3074, in logpmf
    x, xcond = self._process_quantiles(x, n, p)
  File "/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py", line 3032, in _process_quantiles
    (xx.shape[-1], p.shape[-1]))
ValueError: Size of each quantile should be size of p: received 1, but expected 360.
ColtonOsterlund commented 1 year ago

Confirmed that rolling back the hmmlearn package to version 0.2.7 by adding the line:

!pip install --upgrade hmmlearn==0.2.7

after installing ddsp into the notebook fixes the issue for now.

isaac-art commented 1 year ago

Confirmed that rolling back the hmmlearn package to version 0.2.7 by adding the line:

!pip install --upgrade hmmlearn==0.2.7

after installing ddsp into the notebook fixes the issue for now.

this works for me too, thanks

cbmtrx commented 1 year ago

Confirmed that rolling back the hmmlearn package to version 0.2.7 by adding the line:

!pip install --upgrade hmmlearn==0.2.7

after installing ddsp into the notebook fixes the issue for now.

Yes, this worked, thanks.

masseyl commented 1 year ago

Confirmed that rolling back the hmmlearn package to version 0.2.7 by adding the line: !pip install --upgrade hmmlearn==0.2.7 after installing ddsp into the notebook fixes the issue for now.

Yes, this worked, thanks.

Is this fix going to be rolled into the web app? I don't have the chops to run locally...

ColtonOsterlund commented 1 year ago

Confirmed that rolling back the hmmlearn package to version 0.2.7 by adding the line: !pip install --upgrade hmmlearn==0.2.7 after installing ddsp into the notebook fixes the issue for now.

Yes, this worked, thanks.

Is this fix going to be rolled into the web app? I don't have the chops to run locally...

A fix would need to be made to the main ddsp repository to either work with the new version of the hmmlearn package or to set a version dependency so that it always pulls down a version that doesn't break the project - and then that new version of the main ddsp repository would need to be built/published to PyPi to be able to pull down with pip and the version of the ddsp package that the ddsp_vst notebook is pulling would need to be updated. When I get a chance, I can make a PR to that repository to hopefully fix things up, however for now the easiest will be just to click on "Show Code" at the bottom of the notebook cell and paste in the line mentioned above after the section that reads:

print('Installing DDSP...') print('This should take about 2 minutes...') !sudo apt-get install libportaudio2 &> /dev/null !pip install -U ddsp[data_preparation]==3.4.3 &> /dev/null

Just a note that this will need to be done every time you open the notebook.

masseyl commented 1 year ago

@ColtonOsterlund You are an angel - thanks!