ixobert / birds-generation

11 stars 1 forks source link

Any example scripts? #4

Closed rasanderson closed 8 months ago

rasanderson commented 8 months ago

Thank you for a very interesting paper on the ECOGEN project. I've cloned the repository, and installed the libraries in the requirements.txt. Hitting quite a few problems trying to run any training of models, including the first one

python ./src/train_vqvae.py dataset="xeno-canto" mode="train" lr=0.00002 nb_epochs=25000 log_frequency=1 dataset.batch_size=420 dataset.num_workers=8 run_name="ECOGEN Training on Xeno Canto" tags=[vq-vae2,xeno-canto] +gpus=[1] debug=false

which fails on dataset.num_workers and dataset.batch_size configuration. Or is it better to go straight to the Jupyter notebooks and work from those instead?

ixobert commented 8 months ago

Good evening @rasanderson, I'd be glad to assist you.

If that may help, we have provided the weight of a pretrained ECOGEN model here By using this pretrained model, you have the option to bypass the initial stages of training the ECOGEN model from scratch.

rasanderson commented 8 months ago

Hello @ixobert Thanks for your very quick reply. I was just trying to test out the training script, although to be honest I don't have a dataset to test it with yet (hopefully in Feb will have a small dataset of bird calls I need to augment). I set it up within a conda environment, which may not have been ideal as conda and pip don't always play nicely together. When I ran the command line given in your README document it threw the following error on batch size, which puzzled me given that dataset.batch_size is listed. I'll admit that I'm working my way through the code, so it's likely I've made a silly mistake somewhere. Best wishes Roy

python ./src/train_vqvae.py dataset="xeno-canto" mode="train" lr=0.00002 nb_epochs=25000 log_frequency=1 dataset.batch_size=420 dataset.num_workers=8 run_name="ECOGEN Training on Xeno Canto" tags=[vq-vae2,xeno-canto] +gpus=[1] debug=false ./src/train_vqvae.py:133: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="train_vqvae") /home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'train_vqvae': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Traceback (most recent call last): File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 390, in _apply_overrides_to_config OmegaConf.update(cfg, key, value, merge=True) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 741, in update root.setattr(last_key, value) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 337, in setattr raise e File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 334, in setattr self.__set_impl(key, value) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 318, in __set_impl self._set_item_impl(key, value) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 549, in _set_item_impl self._validate_set(key, value) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 180, in _validate_set target = self._get_node(key) if key is not None else self File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 475, in _get_node self._validate_get(key) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 165, in _validate_get key=key, value=value, cause=ConfigAttributeError(msg) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/base.py", line 237, in _format_and_raise type_override=type_override, File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace omegaconf.errors.ConfigAttributeError: Key 'batch_size' is not in struct full_key: dataset.batch_size object_type=dict

haydensflee commented 8 months ago

I had a few similar issues when running the scrips that I had to fix up. One of them was for these errors. I think the cause is because certain input arguments are not being appended to the hydra config files when you run the script. To fix this, I ran +dataset.batch_size=420 and +dataset.num_workers=8. Also i ran +gpus[0] since I only have one gpu. So the command I use is: python ./src/train_vqvae.py dataset="xeno-canto" mode="train" lr=0.00002 nb_epochs=25000 log_frequency=1 +dataset.batch_size=420 +dataset.num_workers=8 run_name="ECOGEN Training on Xeno Canto" tags=[vq-vae2,xeno-canto] +gpus=[0] debug=true

I also had to change train_vqvae.py as well. After running, I got an error similar to this: https://github.com/Lightning-AI/pytorch-lightning/discussions/7525 So i changed self.hparams=hparams to self.save_hyperparameters(hparams) to fix it.

Also, I just tried running sample generation (interpolation) using the model checkpoint. One fix that I had to do was replacing model.net with model in lines 133, 134 and 138 in generate_samples.py. Hope it also works for you.

ixobert commented 8 months ago

@rasanderson, did the fixes worked on your end? Nevertheless, I will apply those fixes(kudos @rasanderson) to the repository. Let me know if you still face some issues during training and/or generation.

rasanderson commented 8 months ago

Hello @ixobert Thank you, the suggestion by @haydensflee resolved the original problem. I am hitting an error where it seems to be looking for a non-existing file on my Linux box, under /home/future. I guess however as I still haven't set the system up properly with any training data that might be the underlying cause.

For info, here are the command line and errors, but my guess is that as I am not using the model properly with any training data yet it is likely to fail. Once I've some data to work with I'll have another look.


./src/train_vqvae.py:133: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="configs", config_name="train_vqvae")
/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'train_vqvae': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
Traceback (most recent call last):
  File "/home/nras/miniconda3/envs/ECOGEN/lib/python3.7/pathlib.py", line 1268, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/future/Documents/runs_articfacts/birds-generation/outputs/2024-01-18/11-07-42'```
haydensflee commented 8 months ago

I think the fix for this is to modify working_dir: '/home/future/Documents/runs_articfacts/birds-generation/outputs' inside /src/configs/train_vqvae.yaml to a directory that you want the outputs to go to. For me I have working_dir: '/home/hayden/projects/birds-generation-master/outputs'

rasanderson commented 8 months ago

Hi @haydensflee - ah brilliant. I'd been wondering where it was configured and had mistakenly been looking through the python scripts, Should have remembered the yaml files. Thank you.