vb000 / Waveformer

A deep neural network architecture for low-latency audio processing
https://arxiv.org/abs/2211.02250
MIT License
287 stars 34 forks source link

Replicate training issue #7

Closed tioans closed 1 year ago

tioans commented 1 year ago

Hi, thank you for sharing the project, I find it very interesting! I was wondering if you could help with an issue I'm encountering while trying to replicate the training procedure from the paper. After following the five steps in the "Training and Evaluation" section of the readme, when I attempt to start the train procedure I ran into the following:

RuntimeError: invalid effect options, see SoX docs for details

It seems this is linked to the train samples given the mixture.jams files, but unsure what might be causing this. Could you please help?

The full stack trace is shown below:

$ python -W ignore -m src.training.train experiments/dcc_tf_ckpt_E256_10_D128_1 --use_cuda
Imported the model from 'src.training.dcc_tf'.

Loading train dataset: fg_dir=data/FSDSoundScapes/FSDKaggle2018/train bg_dir=data/FSDSoundScapes/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development
Loaded train dataset at data/FSDSoundScapes containing 50000 elements
Loading val dataset: fg_dir=data/FSDSoundScapes/FSDKaggle2018/val bg_dir=data/FSDSoundScapes/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development
Loaded test dataset at data/FSDSoundScapes containing 5000 elements
Using CUDA devices: [0]
Initializing optimizer with {'lr': 0.0005, 'weight_decay': 0.0}
Learning rates initialized to: {group 0: params=1.04749M lr=5.0E-04}
Initialized LR scheduler with params: fix_lr_epochs=50 {'mode': 'max', 'factor': 0.1, 'patience': 5, 'min_lr': 5e-06, 'threshold': 0.1, 'threshold_mode': 'abs'}
Epoch 0:
Train:   0%|                                     | 1/3125 [00:22<19:31:41, 22.50s/it, loss=15.99708]
Train:   0%|                                     | 1/3125 [00:22<19:32:17, 22.52s/it, loss=15.99708]
Traceback (most recent call last):
  File "/scratch/IOSZ/waveformer/Waveformer/src/training/train.py", line 195, in train
    curr_train_metrics = train_epoch(model, device, optimizer,
  File "/scratch/IOSZ/waveformer/Waveformer/src/training/train.py", line 47, in train_epoch
    for batch_idx, (mixed, label, gt) in enumerate(train_loader):
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/scratch/IOSZ/waveformer/Waveformer/src/training/synthetic_dataset.py", line 95, in __getitem__
    mixture, jams, ann_list, event_audio_list = scaper.generate_from_jams(
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/scaper/core.py", line 254, in generate_from_jams
    sc._generate_audio(audio_outfile,
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/scaper/core.py", line 1966, in _generate_audio
    event_audio = tfm.build_array(
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/soxbindings/transform.py", line 60, in build_array
    output_audio, sample_rate_out = self.build(input_filepath=input_filepath,
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/soxbindings/transform.py", line 55, in build
    output_audio, sample_rate_out = sox(args, input_array, sample_rate_in)
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/soxbindings/sox_cli.py", line 225, in sox
    output_audio, rate = build_flow_effects(
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/soxbindings/effects.py", line 39, in build_flow_effects
    data, sample_rate = _build_flow_effects(
  File "/home/iosz/.conda/envs/sen/lib/python3.8/site-packages/soxbindings/effects.py", line 95, in _build_flow_effects
    sample_rate, num_channels, data = _soxbindings.build_flow_effects(
RuntimeError: invalid effect options, see SoX docs for details
vb000 commented 1 year ago

Hi @tioans, could you verify that is audio present at the locations listed here?

Loading train dataset: fg_dir=data/FSDSoundScapes/FSDKaggle2018/train bg_dir=data/FSDSoundScapes/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development
Loaded train dataset at data/FSDSoundScapes containing 50000 elements
Loading val dataset: fg_dir=data/FSDSoundScapes/FSDKaggle2018/val bg_dir=data/FSDSoundScapes/TAU-acoustic-sounds/TAU-urban-acoustic-scenes-2019-development
Loaded test dataset at data/FSDSoundScapes containing 5000 elements

Further, you could try making sure the audio file names that .jams file contains exist..

tioans commented 1 year ago

Hi @vb000, thanks for the suggestions. I checked the folders you specified and data is there. Also tried to redownload the entire dataset as per instructions, but no change.

Do you have another idea of what might have went wrong? Could you perhaps check if the error occurs if you make a fresh clone the repo?

tioans commented 1 year ago

Hi again, found the issue. It seems there was some incompatibility between the required packages in my conda environment. After creating a new one with python 3.8 and installing the requirements.txt, the issue disappeared.