Hi @jixinya,
Thanks for your excellent work!
I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below)
https://github.com/jixinya/EVP/blob/990ea8b085a450b6fcc2c28b817989191e173218/train/disentanglement/code/dataload.py#L103-L107
Could you please explain this? Hope for your response.
Hi @jixinya, Thanks for your excellent work! I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below) https://github.com/jixinya/EVP/blob/990ea8b085a450b6fcc2c28b817989191e173218/train/disentanglement/code/dataload.py#L103-L107 Could you please explain this? Hope for your response.