The implementation of "Cross-Reconstructed Emotion Disentanglement" module

Hi @jixinya, Thanks for your excellent work! I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below) https://github.com/jixinya/EVP/blob/990ea8b085a450b6fcc2c28b817989191e173218/train/disentanglement/code/dataload.py#L103-L107 Could you please explain this? Hope for your response.

jixinya / EVP

The implementation of "Cross-Reconstructed Emotion Disentanglement" module #22