Hello, I have been reading your paper and there is one detail that I do not understand. From my understanding your dataset is made up of HDTF and RAVDESS. The model in the paper mentions that the identity one-hot encoding is 24 dimensional. Do these 24 identities correspond to the actors in RAVDESS? If so how are the HDTF identities encoded. Also how does the cross reconstruction loss work with the HDTF dataset since there are no emotions and similar content in these sequences in order to apply this loss term
Hello, I have been reading your paper and there is one detail that I do not understand. From my understanding your dataset is made up of HDTF and RAVDESS. The model in the paper mentions that the identity one-hot encoding is 24 dimensional. Do these 24 identities correspond to the actors in RAVDESS? If so how are the HDTF identities encoded. Also how does the cross reconstruction loss work with the HDTF dataset since there are no emotions and similar content in these sequences in order to apply this loss term