harritaylor / torchvggish

Pytorch port of Google Research's VGGish model used for extracting audio features.
Apache License 2.0
375 stars 66 forks source link

Original vggish vs this.. #18

Open rohan1561 opened 4 years ago

rohan1561 commented 4 years ago

Hey doesn't the original tf implementation have only four convolution layers and two fc layers? this one has 6, 3...why the difference? How could the embeddings be identical then?

KAISER1997 commented 3 years ago

@rohan1561 Could you find something?. I had a similar doubt.

eatsleepraverepeat commented 2 years ago

@rohan1561 @KAISER1997

this implementation is equal to the one placed In tensorflow/models, and there are, indeed - 6 convolution layers and 3 fully connected ones

here's a list of layers from tf 1.x vggish checkpoint, check the scopes

[
    ('vggish/conv1/biases', [64]),
    ('vggish/conv1/weights', [3, 3, 1, 64]),
    ('vggish/conv2/biases', [128]),
    ('vggish/conv2/weights', [3, 3, 64, 128]),
    ('vggish/conv3/conv3_1/biases', [256]),
    ('vggish/conv3/conv3_1/weights', [3, 3, 128, 256]),
    ('vggish/conv3/conv3_2/biases', [256]),
    ('vggish/conv3/conv3_2/weights', [3, 3, 256, 256]),
    ('vggish/conv4/conv4_1/biases', [512]),
    ('vggish/conv4/conv4_1/weights', [3, 3, 256, 512]),
    ('vggish/conv4/conv4_2/biases', [512]),
    ('vggish/conv4/conv4_2/weights', [3, 3, 512, 512]),
    ('vggish/fc1/fc1_1/biases', [4096]),
    ('vggish/fc1/fc1_1/weights', [12288, 4096]),
    ('vggish/fc1/fc1_2/biases', [4096]),
    ('vggish/fc1/fc1_2/weights', [4096, 4096]),
    ('vggish/fc2/biases', [128]),
    ('vggish/fc2/weights', [4096, 128]),
    ('vggish/logits/biases', [4923]),
    ('vggish/logits/weights', [128, 4923])
]