lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

About Immiscible Diffusion #22

Closed AshwinSankar17 closed 1 month ago

AshwinSankar17 commented 1 month ago

In immiscible diffusion, the noise is selected while minimizing the cost w.r.t the data batch thereby minimizing the variance from the source distribution to the target distribution. In CFM however, we can start from any prior distribution (say, text encoder outputs for TTS). Selecting the prior as described in the paper is not possible because of the mapping between the prior and the target ( I would want a particular text embedding to be mapped to the corresponding mel-spectrogram).

So the algorithm should sample both prior and targets like:

let assign_mat

x0 = x0[assign_mat]
x1 = x1[assign_mat]
condition = condition[assign_mat]

image

Please correct me if I'm wrong.

lucidrains commented 1 month ago

@iamunr4v31 oh yes, i forgot about the condition

however, i've removed it since i wasn't able to see a signal at this repo

AshwinSankar17 commented 1 month ago

Interesting. I'm working on something similar at the moment. Will let you know if I have any luck with improving the convergence behavior. But, this is very weird considering the algorithm optimizes for straighter flows.

lucidrains commented 1 month ago

@iamunr4v31 try running it at the rectified flow repository (or if you see an issue, please let me know)

if you see something, willing to spend more time on it

lucidrains commented 1 month ago

@iamunr4v31 and yes, agreed with the straightening flow connection

bfs18 commented 3 weeks ago

@lucidrains Does it improve result. The idea is similar to Improving and generalizing flow-based generative models with minibatch optimal transport, but applied to a larger dataset. I did not observe improvement on the vocoder task; I guess the mel condition is so sufficient that the conditional mapping is already close to a one-to-one map.

lucidrains commented 3 weeks ago

@bfs18 yes, i saw some improvements on small scale image generation

bfs18 commented 3 weeks ago

@lucidrains I'm glad to hear about the positive results on image generation! I'll look into it further.