Closed ServusJon closed 1 year ago
I added my soruce and my target file to "bin/train/outputs/BHPFriedmanBE04 " and ran the following:
(nam) jonathanarnold@MBPvonJonathan NAM conda environment % python bin/train/main.py \ bin/train/inputs/config_data.json \ bin/train/inputs/config_model.json \ bin/train/inputs/config_learning.json \ bin/train/outputs/BHPFriedmanBE04 Traceback (most recent call last): File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 19, in <module> from nam.data import ConcatDataset, ParametricDataset, Split, init_dataset ModuleNotFoundError: No module named 'nam'
What am I missing?
If you do a "pip install -e ." and re-attempt does that work?
That helped. But I am stuck here @2-dor
(nam) jonathanarnold@MBPvonJonathan NAM conda environment % python bin/train/main.py \
bin/train/inputs/config_data.json \
bin/train/inputs/config_model.json \
bin/train/inputs/config_learning.json \
bin/train/outputs/BHPFriedmanBE04
Traceback (most recent call last):
File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 191, in
That helped. But I am stuck here @2-dor
(nam) jonathanarnold@MBPvonJonathan NAM conda environment % python bin/train/main.py bin/train/inputs/config_data.json bin/train/inputs/config_model.json bin/train/inputs/config_learning.json bin/train/outputs/BHPFriedmanBE04 Traceback (most recent call last): File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 191, in main(parser.parse_args()) File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 125, in main with open(args.data_config_path, "r") as fp: FileNotFoundError: [Errno 2] No such file or directory: 'bin/train/inputs/config_data.json'
I don't know to be honest. You could try running the "jupyter notebooks" as pointed out in the facebook group:
That helped a lot! But now I am not sure what to do regarding the 4 different audio files. I only have the "output.wav" and the "v1_1_1.wav".
How does your config look like? @2-dor
You can just run the "easy" version - you only need the "v1_1_1.wav" and "output.wav" files for that.
I don't have this folder structure like you. :(
Oh - have you cloned the current Git repository?
Had the wrong branch checked out haha:
MisconfigurationException Traceback (most recent call last) Cell In[2], line 2 1 get_ipython().run_line_magic('tensorboard', '--logdir /content/lightning_logs') ----> 2 run( 3 epochs=100, 4 architecture="standard" # standard, lite, feather 5 )
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/nam/train/colab.py:363, in run(epochs, delay, architecture, lr, lr_decay, seed) 360 train_dataloader = DataLoader(dataset_train, learning_config["train_dataloader"]) 361 val_dataloader = DataLoader(dataset_validation, learning_config["val_dataloader"]) --> 363 trainer = pl.Trainer( 364 callbacks=[ 365 pl.callbacks.model_checkpoint.ModelCheckpoint( 366 filename="checkpointbest{epoch:04d}{step}{ESR:.4f}_{MSE:.3e}", 367 save_top_k=3, 368 monitor="val_loss", 369 every_n_epochs=1, 370 ), 371 pl.callbacks.model_checkpoint.ModelCheckpoint( 372 filename="checkpointlast{epoch:04d}_{step}", every_n_epochs=1 373 ), 374 ], 375 **learning_config["trainer"], 376 ) 377 trainer.fit(model, train_dataloader, val_dataloader) 379 # Go to best checkpoint
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py:348, in _defaults_from_env_vars.
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:420, in Trainer.init(self, logger, enable_checkpointing, callbacks, default_root_dir, gradient_clip_val, gradient_clip_algorithm, num_nodes, num_processes, devices, gpus, auto_select_gpus, tpu_cores, ipus, enable_progress_bar, overfit_batches, track_grad_norm, check_val_every_n_epoch, fast_dev_run, accumulate_grad_batches, max_epochs, min_epochs, max_steps, min_steps, max_time, limit_train_batches, limit_val_batches, limit_test_batches, limit_predict_batches, val_check_interval, log_every_n_steps, accelerator, strategy, sync_batchnorm, precision, enable_model_summary, num_sanity_val_steps, resume_from_checkpoint, profiler, benchmark, deterministic, reload_dataloaders_every_n_epochs, auto_lr_find, replace_sampler_ddp, detect_anomaly, auto_scale_batch_size, plugins, amp_backend, amp_level, move_metrics_to_cpu, multiple_trainloader_mode, inference_mode) 417 # init connectors 418 self._data_connector = DataConnector(self, multiple_trainloader_mode) --> 420 self._accelerator_connector = AcceleratorConnector( 421 num_processes=num_processes, 422 devices=devices, 423 tpu_cores=tpu_cores, 424 ipus=ipus, 425 accelerator=accelerator, 426 strategy=strategy, 427 gpus=gpus, 428 num_nodes=num_nodes, 429 sync_batchnorm=sync_batchnorm, 430 benchmark=benchmark, 431 replace_sampler_ddp=replace_sampler_ddp, 432 deterministic=deterministic, 433 auto_select_gpus=auto_select_gpus, 434 precision=precision, 435 amp_type=amp_backend, 436 amp_level=amp_level, 437 plugins=plugins, 438 ) 439 self._logger_connector = LoggerConnector(self) 440 self._callback_connector = CallbackConnector(self)
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:202, in AcceleratorConnector.init(self, devices, num_nodes, accelerator, strategy, plugins, precision, amp_type, amp_level, sync_batchnorm, benchmark, replace_sampler_ddp, deterministic, auto_select_gpus, num_processes, tpu_cores, ipus, gpus) 200 self._accelerator_flag = self._choose_auto_accelerator() 201 elif self._accelerator_flag == "gpu": --> 202 self._accelerator_flag = self._choose_gpu_accelerator_backend() 204 self._set_parallel_devices_and_init_accelerator() 206 # 3. Instantiate ClusterEnvironment
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:537, in AcceleratorConnector._choose_gpu_accelerator_backend() 534 if CUDAAccelerator.is_available(): 535 return "cuda" --> 537 raise MisconfigurationException("No supported gpu backend found!")
MisconfigurationException: No supported gpu backend found!
I am on a mac m1. Not sure how I can change to CPU
I am on a mac m1. Not sure how I can change to CPU
Try this "conda install pytorch torchvision torchaudio cpuonly -c pytorch" and re-run
I did. Same error :(
MisconfigurationException Traceback (most recent call last) Cell In[2], line 2 1 get_ipython().run_line_magic('tensorboard', '--logdir /content/lightning_logs') ----> 2 run( 3 epochs=100, 4 architecture="standard" # standard, lite, feather 5 )
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/nam/train/colab.py:363, in run(epochs, delay, architecture, lr, lr_decay, seed) 360 train_dataloader = DataLoader(dataset_train, learning_config["train_dataloader"]) 361 val_dataloader = DataLoader(dataset_validation, learning_config["val_dataloader"]) --> 363 trainer = pl.Trainer( 364 callbacks=[ 365 pl.callbacks.model_checkpoint.ModelCheckpoint( 366 filename="checkpointbest{epoch:04d}{step}{ESR:.4f}_{MSE:.3e}", 367 save_top_k=3, 368 monitor="val_loss", 369 every_n_epochs=1, 370 ), 371 pl.callbacks.model_checkpoint.ModelCheckpoint( 372 filename="checkpointlast{epoch:04d}_{step}", every_n_epochs=1 373 ), 374 ], 375 **learning_config["trainer"], 376 ) 377 trainer.fit(model, train_dataloader, val_dataloader) 379 # Go to best checkpoint
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py:348, in _defaults_from_env_vars.
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:420, in Trainer.init(self, logger, enable_checkpointing, callbacks, default_root_dir, gradient_clip_val, gradient_clip_algorithm, num_nodes, num_processes, devices, gpus, auto_select_gpus, tpu_cores, ipus, enable_progress_bar, overfit_batches, track_grad_norm, check_val_every_n_epoch, fast_dev_run, accumulate_grad_batches, max_epochs, min_epochs, max_steps, min_steps, max_time, limit_train_batches, limit_val_batches, limit_test_batches, limit_predict_batches, val_check_interval, log_every_n_steps, accelerator, strategy, sync_batchnorm, precision, enable_model_summary, num_sanity_val_steps, resume_from_checkpoint, profiler, benchmark, deterministic, reload_dataloaders_every_n_epochs, auto_lr_find, replace_sampler_ddp, detect_anomaly, auto_scale_batch_size, plugins, amp_backend, amp_level, move_metrics_to_cpu, multiple_trainloader_mode, inference_mode) 417 # init connectors 418 self._data_connector = DataConnector(self, multiple_trainloader_mode) --> 420 self._accelerator_connector = AcceleratorConnector( 421 num_processes=num_processes, 422 devices=devices, 423 tpu_cores=tpu_cores, 424 ipus=ipus, 425 accelerator=accelerator, 426 strategy=strategy, 427 gpus=gpus, 428 num_nodes=num_nodes, 429 sync_batchnorm=sync_batchnorm, 430 benchmark=benchmark, 431 replace_sampler_ddp=replace_sampler_ddp, 432 deterministic=deterministic, 433 auto_select_gpus=auto_select_gpus, 434 precision=precision, 435 amp_type=amp_backend, 436 amp_level=amp_level, 437 plugins=plugins, 438 ) 439 self._logger_connector = LoggerConnector(self) 440 self._callback_connector = CallbackConnector(self)
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:202, in AcceleratorConnector.init(self, devices, num_nodes, accelerator, strategy, plugins, precision, amp_type, amp_level, sync_batchnorm, benchmark, replace_sampler_ddp, deterministic, auto_select_gpus, num_processes, tpu_cores, ipus, gpus) 200 self._accelerator_flag = self._choose_auto_accelerator() 201 elif self._accelerator_flag == "gpu": --> 202 self._accelerator_flag = self._choose_gpu_accelerator_backend() 204 self._set_parallel_devices_and_init_accelerator() 206 # 3. Instantiate ClusterEnvironment
File ~/opt/anaconda3/envs/nam/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:537, in AcceleratorConnector._choose_gpu_accelerator_backend() 534 if CUDAAccelerator.is_available(): 535 return "cuda" --> 537 raise MisconfigurationException("No supported gpu backend found!")
MisconfigurationException: No supported gpu backend found!
How your command looked like:
(nam) jonathanarnold@MBPvonJonathan NAM conda environment % conda install pytorch torchvision torchaudio cpuonly -c pytorch Collecting package metadata (current_repodata.json): done Solving environment: done
environment location: /Users/jonathanarnold/opt/anaconda3/envs/nam
added / updated specs:
The following packages will be downloaded:
package | build
---------------------------|-----------------
cpuonly-2.0 | 0 2 KB pytorch
ffmpeg-4.3 | h0a44026_0 10.1 MB pytorch
gnutls-3.6.15 | hed9c0bf_0 974 KB
lame-3.100 | h1de35cc_0 316 KB
libtasn1-4.16.0 | h9ed2024_0 53 KB
nettle-3.7.3 | h230ac6f_1 380 KB
openh264-2.1.1 | h8346a28_0 655 KB
pytorch-mutex-1.0 | cpu 3 KB pytorch
torchaudio-0.13.1 | py310_cpu 5.6 MB pytorch
torchvision-0.14.1 | py310_cpu 6.2 MB pytorch
------------------------------------------------------------
Total: 24.2 MB
The following NEW packages will be INSTALLED:
cpuonly pytorch/noarch::cpuonly-2.0-0 ffmpeg pytorch/osx-64::ffmpeg-4.3-h0a44026_0 gmp pkgs/main/osx-64::gmp-6.2.1-he9d5cce_3 gnutls pkgs/main/osx-64::gnutls-3.6.15-hed9c0bf_0 lame pkgs/main/osx-64::lame-3.100-h1de35cc_0 libidn2 pkgs/main/osx-64::libidn2-2.3.2-h9ed2024_0 libtasn1 pkgs/main/osx-64::libtasn1-4.16.0-h9ed2024_0 libunistring pkgs/main/osx-64::libunistring-0.9.10-h9ed2024_0 nettle pkgs/main/osx-64::nettle-3.7.3-h230ac6f_1 openh264 pkgs/main/osx-64::openh264-2.1.1-h8346a28_0 pytorch-mutex pytorch/noarch::pytorch-mutex-1.0-cpu torchaudio pytorch/osx-64::torchaudio-0.13.1-py310_cpu torchvision pytorch/osx-64::torchvision-0.14.1-py310_cpu
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: / WARNING conda.core.path_actions:verify(1094): Unable to create environments file. Path not writable.
environment location: /Users/jonathanarnold/.conda/environments.txt
done
Executing transaction: | WARNING conda.core.envs_manager:register_env(49): Unable to register environment. Path not writable or missing.
environment location: /Users/jonathanarnold/opt/anaconda3/envs/nam
registry file: /Users/jonathanarnold/.conda/environments.txt
done
@2-dor any hints?
@2-dor any hints?
Hey - sorry, nothing yet. I'll try to spin un a "greenfields" virtual machine today (no GPU acceleration) to see if I can get CPU training running.
@2-dor any hints?
I haven't been able to get it running unfortunately. If I do, I'll post instructions. Looks like it takes a bit more shoveling to trigger the training this way and the "easy" way doesn't seem to take effect.
@sdatkinson
Oh boy...there are a lot of different questions in here!
That helped. But I am stuck here @2-dor
(nam) jonathanarnold@MBPvonJonathan NAM conda environment % python bin/train/main.py bin/train/inputs/config_data.json bin/train/inputs/config_model.json bin/train/inputs/config_learning.json bin/train/outputs/BHPFriedmanBE04 Traceback (most recent call last): File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 191, in main(parser.parse_args()) File "/Users/jonathanarnold/Desktop/_dev/NAM conda environment/bin/train/main.py", line 125, in main with open(args.data_config_path, "r") as fp: FileNotFoundError: [Errno 2] No such file or directory: 'bin/train/inputs/config_data.json'
This happens because bin/train/inputs/config_data.json
is not a file that exists relative to your current working directory when you run the code. Check the path and make adjustments so that it's the path to a file that you have. You probably want e.g. bin/train/inputs/data/single_pair.json
.
To the issue in @2-dor's most recent comment on this thread, the likely reason for this is that the "start" of the interval that is being taken from the audio files is after their end--so it selects and gets nothing 😅 . This is what is supposed to happen with Python's typical array indexing, but probably points to the user making a mistake in this case. I can add a check & have it give an error to bring it to your attention so that the failure is a little less cryptic. I'll put this in a separate issue because it won't be findable in the depths of this thread 🙂
I think that the rest of the thread is addressed in my comments to #94. If there's something else, then you can open another Issue with what's specifically going wrong.
I added my soruce and my target file to "bin/train/outputs/BHPFriedmanBE04 " and ran the following:
What am I missing?