harritaylor / torchvggish

Pytorch port of Google Research's VGGish model used for extracting audio features.
Apache License 2.0
377 stars 68 forks source link

Diff between the pytorchvggish and tensorflowvggish #5

Closed suzhenghang closed 5 years ago

suzhenghang commented 5 years ago

Hi @harritaylor , I have inputed the piano.wav into tensorflow vggish, but the pca embedding is diff from pytorchvggish. Do you verify the output after the conversion?

def get_vggish_input(self, wav_file):
    try:
        examples_batch = vggish_input.wavfile_to_examples(wav_file)
        # Prepare a postprocessor to munge the model embeddings.
        pproc = vggish_postprocess.Postprocessor(pca_params)
        return examples_batch, pproc
    except:
        traceback.print_exc()
    return None, None

def get_features(self, examples_batch, pproc):
    try:
        # Run inference and postprocessing.
        [embedding_batch] = self.sess.run([self.embedding_tensor],
                                    feed_dict={self.features_tensor: examples_batch})
        postprocessed_batch = pproc.postprocess(embedding_batch)
        # cv2.imwrite("test.bmp", postprocessed_batch)
        return postprocessed_batch
    except:
        traceback.print_exc()
    return None
harritaylor commented 5 years ago

Haven’t verified yet - that’s next on my TODO list. Thanks for raising it.

I suspected that the embedding will not be 100% the same, so I’ll be writing test cases next week to see how different the embeddings are.

harritaylor commented 5 years ago

@suzhenghang I have just investigated this and the embeddings are much different. I don't know why this is, so I will be working on figuring this out over the next few weeks. Again, thanks for raising this.

harritaylor commented 5 years ago

Apologies for the amount of updates. I have found the problem and will be updating the code shortly to reflect this. I didn't account for the different dimensions of the data between tensorflow and pytorch when flattening the features output. As you can see, the comparison is identical: Figure_1 with a cosine distance of 0. There are some variations at the 5th decimal place, but the PCA postprocessor ignores these small variations and produces the correct embeddings. I'll close this issue now as this problem has been resolved. Thank you for reporting it.