Closed AshwinSankar17 closed 1 month ago
@iamunr4v31 oh yes, i forgot about the condition
however, i've removed it since i wasn't able to see a signal at this repo
Interesting. I'm working on something similar at the moment. Will let you know if I have any luck with improving the convergence behavior. But, this is very weird considering the algorithm optimizes for straighter flows.
@iamunr4v31 try running it at the rectified flow repository (or if you see an issue, please let me know)
if you see something, willing to spend more time on it
@iamunr4v31 and yes, agreed with the straightening flow connection
@lucidrains Does it improve result. The idea is similar to Improving and generalizing flow-based generative models with minibatch optimal transport, but applied to a larger dataset. I did not observe improvement on the vocoder task; I guess the mel condition is so sufficient that the conditional mapping is already close to a one-to-one map.
@bfs18 yes, i saw some improvements on small scale image generation
@lucidrains I'm glad to hear about the positive results on image generation! I'll look into it further.
In immiscible diffusion, the noise is selected while minimizing the cost w.r.t the data batch thereby minimizing the variance from the source distribution to the target distribution. In CFM however, we can start from any prior distribution (say, text encoder outputs for TTS). Selecting the prior as described in the paper is not possible because of the mapping between the prior and the target ( I would want a particular text embedding to be mapped to the corresponding mel-spectrogram).
So the algorithm should sample both prior and targets like:
Please correct me if I'm wrong.