rendchevi / nix-tts

🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation
MIT License
243 stars 33 forks source link

Question about the parameter size #5

Closed aask1357 closed 2 years ago

aask1357 commented 2 years ago

Hello, according to your paper and github page, the parameter size is either 5.23M(deterministic predictor) or 6.03MB(stochastic predictor). However, the parameter size of the pretrained model far exceeds those. In the text & latent encoder, there are total 18 convs, each having a weight with shape 192x192x5, total (192x192x5)x18x4 Byte(since a float variable is 4 Byte) = 12.65625 MB. If we include bias of conv layers, an embedding layer, a projection layer, a duration predictor module, norm layers and a decoder module, the total parameter will be even bigger. Can you tell me how you calculated the parameter size?

EDIT I noticed that the given value was the number of parameters, not the size of parameters. Thank you.