Thank you for all your efforts! I have a question about the codes :->.
I wonder why is "cond" defaulted to "target" here in the forward() function of the VoiceBox class.
cond = default(cond, target)
It seems that x, cond and cond tokens in this implementation respectively correspond to w, xctx and z in VoiceBox paper.
But xctx should be masked from x1, rather than target=x1-(1-σ)x0, is that true?
Some insights would be greatly appreciated.
Thank you for all your efforts! I have a question about the codes :->. I wonder why is "cond" defaulted to "target" here in the forward() function of the VoiceBox class.
cond = default(cond, target)
It seems that x, cond and cond tokens in this implementation respectively correspond to w, xctx and z in VoiceBox paper. But xctx should be masked from x1, rather than target=x1-(1-σ)x0, is that true? Some insights would be greatly appreciated.