hifi_gan-vctk_small vs hifi_gan-vctk_medium (release 2021-03-28)

svenha commented 3 years ago

The naming confuses me a little bit. hifi_gan-vctk_small is larger (and slower) than hifi_gan-vctk_medium.

synesthesiam commented 3 years ago

I wondered this as well, but the labeling from the pre-trained models in the original repo has the "medium" one as vctk_v2 and "small" as vctk_v3. Based on my understanding of the config files, v2 should be larger/slower than v3.

To make it extra confusing, the small/v3 model uses a different "resblock" but more upscale channels than medium/v2, which uses a similar configuration to the universal_large/v1 model.

I may just flip the medium/small labels though if there is an obvious performance difference between the two. I've focused all my testing on the large vs. small to date.

fquirin commented 3 years ago

So I've tested medium and small for a larger number of voices, short and long sentences and small was either equal or even slower (within the error bars I guess).

synesthesiam commented 2 years ago

I ended up swapping the medium/low vocoder labels in v0.5

rhasspy / larynx

hifi_gan-vctk_small vs hifi_gan-vctk_medium (release 2021-03-28) #10