talhanai / redbud-tree-depression

scripts to model depression in speech and text
70 stars 30 forks source link

Feature processing #1

Closed Jackwu2018 closed 4 years ago

Jackwu2018 commented 5 years ago

How do you deal with the problem of the audio files containing the interviewer voice?How to get rid of the interviewer's voice ?how to extract the higher-order statistics features of 79 convarep features?

talhanai commented 4 years ago
  1. To get rid of the interviewers voice, snip the audio segment that belongs to the subject. The transcripts have timestamps that indicate which speaker was speaking at which time. You can use a tool like ffmpeg on the linux command line to snip the audio at the timestamps you desire, or you can load the audio and transcript timestamps in python, and snip it programmatically that way.

  2. For higher order statistics from the covarep features, you can calculate statistics like mean, max, min, median, kurtosis, skew over an array of feature values. The array will become a scalar value representing a statistic.

    
    import numpy as np

frame-level statistic of a feature

covarep_feature_1 = [0.1, 0.1, 0.3, 1.3, 0.5]

higher-order statistic

mean = np.mean(covarep_feature_1)



I hope that clarifies it.