Speech processing in UIDVA data set

The audio processing methods mentioned in the paper are as follows: 1、Extracting features directly from the entire audio segment, and 2、Segmenting the audio and extracting features from each segment separately. However, in the UDVIA dataset, each video segment consists of a dialogue between two individuals. When extracting features from the entire audio segment, should the presence of different speakers be taken into account?

liaorongfan / DeepPersonality

Speech processing in UIDVA data set #9