Pre-activation as output of VGGish

harritaylor / torchvggish

Pytorch port of Google Research's VGGish model used for extracting audio features.

Apache License 2.0

377 stars 68 forks source link

Hello there,

when comparing this code to the one placed in tensorflow/models I've found that implementations use different layers as output of VGGish model (if considering activation as a separate layer),

yours: https://github.com/harritaylor/torchvggish/blob/46701162fd6b3684b6f6cf3b1afda100073850ae/torchvggish/vggish.py#L19

google's: https://github.com/tensorflow/models/blob/f32dea32e3e9d3de7ed13c9b16dc7a8fea3bd73d/research/audioset/vggish/vggish_slim.py#L104-L106 (activation_fn=None)

Also, it's mentioned in README

Note that the embedding layer does not include a final non-linear activation, so the embedding value is pre-activation

Changing output layer of VGGish in your implementation to pre-activation one (w/o RELU) makes embeddings (almost) equal in both cases, - raw and PCA'ed ones.

Thanks for porting though, great work!

harritaylor / torchvggish

Pre-activation as output of VGGish #24