Closed 1213999170 closed 4 years ago
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from TensorFlow version Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce
Thank you for replying the question. Here is my environment info. :
What is the top-level directory of the model you are using
the path: ~/models/research/audioset
Have I written custom code
No
OS Platform and Distribution
Ubuntu 5.4.0-6ubuntu1~16.04.4
TensorFlow installed from
anaconda3
TensorFlow version
tensorflow 1.9.0
Bazel version
N/A
CUDA/cuDNN version
N/A
GPU model and memory
N/A
Exact command to reproduce
python vggish_inference_demo.py --wav_file "./hLb9ujxBUzI.wav" --tfrecord_file "./hLb9ujxBUzI.tfrecord"
I want to know whether it's normal of the extremely huge value (pca_eigen_vector[127][65] = 44.999) for PCA parameters ? In my opinion, 44.999 is so huge that it would make the last element of the 128-embedding features always be 255. It is equivalent to that we only have 127-embedding features as the 128th seems nonsense for its constant value(255). I noticed the same problem occurs in Shai Rozenberg's comment
@plakal Thanks for handling this issue. Do you have any progress on this problem? Do I miscalculate the number? Please let me know anything you have found.
Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
I have run the VGGish model to extracting features from .wav files. But the 128-embedding features seem quite different to the published features of audioset .
Finally I found there is an extremely huge element in Embedding PCA parameters as following:
All the pca_eigen_vectors' elements' absolute values are less than 5, except for pca_eigen_vector[127][65] which is 44.999. Meanwhile, the pca_means[65] is -0.251, and 44.999 * 0.251 = 11.295. It's quite beyond the QUANTIZE_MAX_VAL = +2.0.
So all the last elements of the 128-embedding features extracted by VGGish model are 255. There must be something wrong with the PCA parameters.