hche11 / VGGSound

VGGSound: A Large-scale Audio-Visual Dataset
http://www.robots.ox.ac.uk/~vgg/data/vggsound/
Other
285 stars 31 forks source link

Couldn't Evaluate the Predictions Generated #3

Closed Shahabaz40 closed 4 years ago

Shahabaz40 commented 4 years ago

Apologies for the long post and my ignorance.

Generating Predictions: I downloaded the audio files using the scripts from the mentioned GitHub directory. After that I generated the predictions using the following command.

python3 --summaries "./Weights/vggsound_avgpool.pth.tar" --pool "avgpool"  --batch_size=1

While generating the predictions I got the follwoing error.

  File "/usr/local/lib/python3.6/dist-packages/scipy/signal/spectral.py", line 1757, in _spectral_helper
    raise ValueError('noverlap must be less than nperseg.')

Solved the error using the default (nperseg//8) value for noverlap. But got the following warnings.

UserWarning: nperseg = 256 is greater than input length  = 20, using nperseg = 20
  .format(nperseg, input_length))

Had to make the following change in the file test.py to line number 108.

//_Original_
aud_o = model(spec.unsqueeze(1).float())
//_Changed to_
aud_o = model(spec.unsqueeze(1).squeeze(-1).float())

Otherwise it was giving the following error.

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 1, 7, 7], but got 5-dimensional input of size [1, 1, 160000, 11, 1] instead

Evaluating: While evaluating the predictions using the script eval.py it showed the following errors.

RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
Traceback (most recent call last):
  File "eval.py", line 71, in 
    main()
  File "eval.py", line 64, in main
    mAUC = np.mean([stat['auc'] for stat in stats])
  File "eval.py", line 64, in 
    mAUC = np.mean([stat['auc'] for stat in stats])
KeyError: 'auc'

I didn't download the whole dataset. I downloaded a part(2231) of it and generated my own _mytest.csv from the downloaded files. The audio files were downloaded in .flac format and then converted to .wav

Would you please tell me what am I doing wrong? I am new to Deep Learning research arena so please do pardon my ignorance.

hche11 commented 4 years ago

Hi, thanks for your interests. It seems that you did not convert the audios correctly, try using this code to convert to .wav format. Then use the original codes to test and eval.