sooftware / conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
Apache License 2.0
958 stars 175 forks source link

Count of Conformer parameters mismatch with that in the paper #35

Open maxwellzh opened 3 years ago

maxwellzh commented 3 years ago

In the Conformer original paper, the number of parameters are

截屏2021-10-18 下午3 22 54

However, with the implementation in this repo, the number of parameters are slightly different

Conformer  small: 10.16 M
Conformer medium: 31.86 M
Conformer  large: 120.11 M

I get the size with this script

from conformer import Conformer

def count_parameters(model) -> int:
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

models = {
    'small': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=144,
        decoder_dim=320,
        num_encoder_layers=16,
        num_decoder_layers=1,
        num_attention_heads=4,
        conv_kernel_size=31
    ),
    'medium': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=256,
        decoder_dim=640,
        num_encoder_layers=16,
        num_decoder_layers=1,
        num_attention_heads=4,
        conv_kernel_size=31
    ),
    'large': Conformer(
        num_classes=1000,
        input_dim=80,
        encoder_dim=512,
        decoder_dim=640,
        num_encoder_layers=17,
        num_decoder_layers=1,
        num_attention_heads=8,
        conv_kernel_size=31
    )
}

for size, m in models.items():
    print("Conformer {:>6}: {:.2f} M".format(size, count_parameters(m)/1e6))

Since the convolution layer kernel size couldn't be set to 32, I just set it to 31. But this won't make such difference in number of params.

sooftware commented 3 years ago

This is not an official implementation, so there is a slight difference in the number of parameters.
Of course, I tried to implement it as similar as possible to the contents of the paper. :).

sooftware commented 3 years ago

Also, num_classes affects.

maxwellzh commented 3 years ago

This is kind of weird. I test several open-source Conformer implementation (I also implement it myself), but none of them can strictly match the reported number of parameters. Do you have any idea where the difference may be? btw. num_classes is set to 1k according to the paper.

sooftware commented 3 years ago

I'm curious, too. I am only speculating that there may be details not mentioned in the paper.