lucidrains / voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
MIT License
589 stars 49 forks source link

Don't explicitly set the step size, derive it from the number of steps instead #38

Closed lucasnewman closed 9 months ago

lucasnewman commented 9 months ago

I noticed inference time was static even when changing the number of ODE steps, and it looks like the step size is being provided explicitly at init instead of deriving it from the steps -> sample times array during sampling. Not setting this parameter allows sampling to use the correct number of steps, and acts as a time/quality tradeoff.

Here are some examples using power-of-2 step increments as described in the paper — there's a noticeable quality improvement when evaluating the flow at more timesteps:

4 Steps 8 Steps 16 Steps 32 Steps 64 Steps 128 Steps

lucidrains commented 9 months ago

🎸 🚀

lucidrains commented 9 months ago

@lucasnewman was gateloop a part of this run?