hugofloresgarcia / vampnet

music generation with masked transformers!
https://hugo-does-things.notion.site/VampNet-Music-Generation-via-Masked-Acoustic-Token-Modeling-e37aabd0d5f1493aa42c5711d0764b33?pvs=4
MIT License
295 stars 35 forks source link

`RuntimeError: Audio file <> with offset 1472.0742892449407 and duration 10.0 is empty!` #21

Open cyrusvahidi opened 12 months ago

cyrusvahidi commented 12 months ago
runtimeError: Audio file <> with offset 1472.0742892449407 and duration 10.0 is empty!

I've been getting this error about 1-2 hours into training. It happens for a different audio file every time.

Will look into it. Wonder if you know whether it's a bug or an issue with the audio files?

hugofloresgarcia commented 12 months ago

hmm, might be a problem with audiotools or the encoding of your audio files since you mention it happening with many audio files. what format are your audio files encoded in?

I've gotten this issue before. It's usually been due to a corrupt audio file, though that may not be the case here. Would you mind sharing a full stack trace + audio file?

cyrusvahidi commented 11 months ago

they're all are mp3s

RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/data/datasets.py", line 419, in __getitem__
    item[keys[0]] = loader(**loader_kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/data/datasets.py", line 103, in __call__
    signal = AudioSignal.salient_excerpt(
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 281, in salient_excerpt
    excerpt = cls.excerpt(audio_path, state=state, **kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 221, in excerpt
    signal = cls(audio_path, offset=offset, duration=duration, **kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 154, in __init__
    self.load_from_file(
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 510, in load_from_file
    raise RuntimeError(
RuntimeError: Audio file <>.mp3 with offset 1472.0742892449407 and duration 10.0 is empty!

The audio file is 24:32 minutes long at 44.1 kHz. I figure that the offset 9 is out of bounds, since it starts at 24:53.

cyrusvahidi commented 11 months ago

I just managed to complete 100K iterations of coarse. Now moving to c2f I get the same error:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /homes/cv300/Documents/vampnet/scripts/exp/train.py:680 in <module>                              │
│                                                                                                  │
│   677 │   │   with Accelerator() as accel:                                                       │
│   678 │   │   │   if accel.local_rank != 0:                                                      │
│   679 │   │   │   │   sys.tracebacklimit = 0                                                     │
│ ❱ 680 │   │   │   train(args, accel)                                                             │
│   681                                                                                            │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/argbind/argbind.py:159 in cmd_func            │
│                                                                                                  │
│   156 │   │   │   │   else:                                                                      │
│   157 │   │   │   │   │   scope = None                                                           │
│   158 │   │   │   │   print(_format_func_debug(prefix, kwargs, scope))                           │
│ ❱ 159 │   │   │   return func(*cmd_args, **kwargs)                                               │
│   160 │   │                                                                                      │
│   161 │   │   if is_class:                                                                       │
│   162 │   │   │   setattr(object_or_func, "__init__", cmd_func)                                  │
│                                                                                                  │
│ /homes/cv300/Documents/vampnet/scripts/exp/train.py:659 in train                                 │
│                                                                                                  │
│   656 │   │   │   │   save_samples(state, val_idx, writer)                                       │
│   657 │   │   │                                                                                  │
│   658 │   │   │   if tracker.step % val_freq == 0 or last_iter:                                  │
│ ❱ 659 │   │   │   │   validate(state, val_dataloader, accel)                                     │
│   660 │   │   │   │   checkpoint(                                                                │
│   661 │   │   │   │   │   state=state,                                                           │
│   662 │   │   │   │   │   save_iters=save_iters,                                                 │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/ml/decorators.py:375 in decorated  │
│                                                                                                  │
│   372 │   │   def decorator(fn):                                                                 │
│   373 │   │   │   @wraps(fn)                                                                     │
│   374 │   │   │   def decorated(*args, **kwargs):                                                │
│ ❱ 375 │   │   │   │   output = fn(*args, **kwargs)                                               │
│   376 │   │   │   │   if self.rank == 0:                                                         │
│   377 │   │   │   │   │   nonlocal value_type, label                                             │
│   378 │   │   │   │   │   metrics = self.metrics[label][value_type]                              │
│                                                                                                  │
│ /homes/cv300/Documents/vampnet/scripts/exp/train.py:319 in validate                              │
│                                                                                                  │
│   316                                                                                            │
│   317                                                                                            │
│   318 def validate(state, val_dataloader, accel):                                                │
│ ❱ 319 │   for batch in val_dataloader:                                                           │
│   320 │   │   output = val_loop(state, batch, accel)                                             │
│   321 │   # Consolidate state dicts if using ZeroRedundancyOptimizer                             │
│   322 │   if hasattr(state.optimizer, "consolidate_state_dict"):                                 │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633 in         │
│ __next__                                                                                         │
│                                                                                                  │
│    630 │   │   │   if self._sampler_iter is None:                                                │
│    631 │   │   │   │   # TODO(https://github.com/pytorch/pytorch/issues/76750)                   │
│    632 │   │   │   │   self._reset()  # type: ignore[call-arg]                                   │
│ ❱  633 │   │   │   data = self._next_data()                                                      │
│    634 │   │   │   self._num_yielded += 1                                                        │
│    635 │   │   │   if self._dataset_kind == _DatasetKind.Iterable and \                          │
│    636 │   │   │   │   │   self._IterableDataset_len_called is not None and \                    │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1325 in        │
│ _next_data                                                                                       │
│                                                                                                  │
│   1322 │   │   │   # Check if the next sample has already been generated                         │
│   1323 │   │   │   if len(self._task_info[self._rcvd_idx]) == 2:                                 │
│   1324 │   │   │   │   data = self._task_info.pop(self._rcvd_idx)[1]                             │
│ ❱ 1325 │   │   │   │   return self._process_data(data)                                           │
│   1326 │   │   │                                                                                 │
│   1327 │   │   │   assert not self._shutdown and self._tasks_outstanding > 0                     │
│   1328 │   │   │   idx, data = self._get_data()                                                  │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1371 in        │
│ _process_data                                                                                    │
│                                                                                                  │
│   1368 │   │   self._rcvd_idx += 1                                                               │
│   1369 │   │   self._try_put_index()                                                             │
│   1370 │   │   if isinstance(data, ExceptionWrapper):                                            │
│ ❱ 1371 │   │   │   data.reraise()                                                                │
│   1372 │   │   return data                                                                       │
│   1373 │                                                                                         │
│   1374 │   def _mark_worker_as_unavailable(self, worker_id, shutdown=False):                     │
│                                                                                                  │
│ /homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/_utils.py:644 in reraise                │
│                                                                                                  │
│   641 │   │   │   # If the exception takes multiple arguments, don't try to                      │
│   642 │   │   │   # instantiate since we don't know how to                                       │
│   643 │   │   │   raise RuntimeError(msg) from None                                              │
│ ❱ 644 │   │   raise exception                                                                    │
│   645                                                                                            │
│   646                                                                                            │
│   647 def _get_available_device_type():                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Caught RuntimeError in DataLoader worker process 5.
Original Traceback (most recent call last):
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/data/datasets.py", line 419, in __getitem__
    item[keys[0]] = loader(**loader_kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/data/datasets.py", line 103, in __call__
    signal = AudioSignal.salient_excerpt(
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 281, in salient_excerpt
    excerpt = cls.excerpt(audio_path, state=state, **kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 221, in excerpt
    signal = cls(audio_path, offset=offset, duration=duration, **kwargs)
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 154, in __init__
    self.load_from_file(
  File "/homes/cv300/venvs/sd2/lib/python3.9/site-packages/audiotools/core/audio_signal.py", line 510, in load_from_file
    raise RuntimeError(
RuntimeError: Audio file /import/c4dm-05/cv/x.mp3 with offset 1335.1792652140298 and duration 3.0 is empty!

I guess this could be an issue with audiotools

hugofloresgarcia commented 11 months ago

yeah, looks like an issue with audiotools and the way that it gets excerpts from audio files (see audiotools.util.info and AudioSignal.salient_excerpt) . I'm currently taking a break, but I'll try to dig a bit deeper into this this weekend!