microsoft / fadtk

A simple library for Fréchet Audio Distance (FAD) calculation
MIT License
136 stars 20 forks source link

Tensor Error when using vggish #25

Open timohromadka opened 3 months ago

timohromadka commented 3 months ago

I've downloaded the latest version of the repo using

pip install git+https://github.com/microsoft/fadtk.git

However, when I go to run the command:

fadtk vggish dataset/one/ dataset/two/

I get an error. However, the error only seems to be with vggish, why is this so? Here's the error:

Traceback (most recent call last):
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/fad_batch.py", line 22, in _cache_embedding_batch
    fad.cache_embedding_file(f)
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/fad.py", line 196, in cache_embedding_file
    embd = self.ml.get_embedding(wav_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/model_loader.py", line 32, in get_embedding
    embd = self._get_embedding(audio)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/model_loader.py", line 80, in _get_embedding
    return self.model.forward(audio, self.sr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.cache/torch/hub/harritaylor_torchvggish_master/torchvggish/vggish.py", line 174, in forward
    x = VGG.forward(self, x)
        ^^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.cache/torch/hub/harritaylor_torchvggish_master/torchvggish/vggish.py", line 29, in forward
    x = x.view(x.size(0), -1)
        ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/th716/.conda/envs/fadtk_env/bin/fadtk", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/__main__.py", line 41, in main
    cache_embedding_files(d, model, workers=args.workers)
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/site-packages/fadtk/fad_batch.py", line 48, in cache_embedding_files
    pool.map(_cache_embedding_batch, [(b, ml, kwargs) for b in batches])
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/th716/.conda/envs/fadtk_env/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
timohromadka commented 3 months ago

I see now, the error only happens when the audio samples themselves are less than what vggish uses (0.96s).