Closed bwang482 closed 3 years ago
Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.
Hi @bluemonk482
The audio level normalization between -1, 1 it is already performed before the feature extraction, thus you dont need to do it
If the energy of the second speaker that says "yes" or "um" is high enough to hear it or to overlap with the speech of the person, I recommend you to cut if before, otherwise, you can just extract the features from the speech files as they are
Thanks very much @jcvasquezc !
Can I please also confirm with you the phonological feature extractor was trained on Spanish, not English??
Yes, the phonological feature extractor was trained in Spanish data
Thanks @jcvasquezc ! Would it be possible to obtain a model trained in English? ...
Yes, I hope that the following update has models for English and German
Thanks a lot @jcvasquezc !
Look forward to the update! Hopefully soon 👍
Hi @jcvasquezc thanks again for the great lib!
I am just wondering if I should perform any data preprocessing before feeding the audio to
extract_features_file
. My audio files are utterances (> 2 secs) mostly one per speaker (sometimes one contains a second speaker saying "yes" or "um") but there's loudness difference in the utterances between the two speakers. Do you suggest I scale the audio waveforms to (-1, +1), save the audio files, and then feed them to the feature extactors?The down-stream task is classification so I didn't want to complicate it by performing more advanced preprocessing. minmax scaling seems sufficient enough do you think so?