lucidrains / voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
MIT License
589 stars 49 forks source link

Fix unconditional sample generation #40

Closed lucasnewman closed 9 months ago

lucasnewman commented 9 months ago

The shape for the autogenerated conditioning mask needs to match the EnCodec latent dimension instead of the projected dimension.

I also added a couple of minor tweaks that make it easier to use a pretrained model down the line 😉