jcvasquezc / DisVoice

feature extraction from speech signals
https://disvoice.readthedocs.io/en/latest/
MIT License
355 stars 80 forks source link

Error in Articulation features #69

Open loukasilias opened 10 months ago

loukasilias commented 10 months ago

Hello,

I am trying to extract articulation features and I am getting the following error. How can I fix it? Thank you!

image

mok0102 commented 10 months ago

please try pip install numpy==1.23.0 I had very same issue and it fixed!

loukasilias commented 10 months ago

@mok0102 thank you! However, I am getting the following error now. Any idea?

image

mok0102 commented 10 months ago

I am also stuck to this error, but I found this is due to praat. Please refer https://github.com/jcvasquezc/DisVoice/issues/16

loukasilias commented 10 months ago

@mok0102 thanks! I noticed that prosody returns some NaN values. Do you have any idea?

marekjg commented 9 months ago

You can use absolute path for the audio path. It worked for me

MengXJ773 commented 7 months ago

I am also stuck to this error, but I found this is due to praat. Please refer #16

Hello, I met the same problem. I‘m sure that praat is added to path, but the problem still exists. Do you have any idea?

mafaves commented 2 months ago

I encountered similar issues and I think I found a solution. I am going to try to explain it. Modifications:

praat_functions.py file

  1. I add: import parselmouth (so you need to install it if the library is not installed yet)
  2. I modified praat_formants function to:

    def praat_formants(audio_filename):
    """
    Extract F1 and F2 formants from the given audio file using Praat (via Parselmouth).
    
    :param audio_file: Path to the audio file (WAV format).
    :returns: Tuple of F1 and F2 arrays
    """
    # Load the sound
    snd = parselmouth.Sound(audio_filename)
    
    # Extract the formant object using Burg's method
    formant = snd.to_formant_burg(time_step=0.02, max_number_of_formants=5, maximum_formant=5500)
    
    # Prepare arrays to store the formant frequencies
    formant_list_f1 = []
    formant_list_f2 = []
    
    # Loop over the duration of the sound to get formants for each time slice
    for t in np.arange(0, snd.duration, 0.02):  # 0.02 is the time step
        try:
            f1 = formant.get_value_at_time(1, t)  # F1 (first formant)
            f2 = formant.get_value_at_time(2, t)  # F2 (second formant)
            formant_list_f1.append(f1)
            formant_list_f2.append(f2)
        except:
            formant_list_f1.append(None)
            formant_list_f2.append(None)
    
    # Convert to numpy arrays, handling possible None values
    f1_array = np.array([f if f is not None else np.nan for f in formant_list_f1])
    f2_array = np.array([f if f is not None else np.nan for f in formant_list_f2])
    
    # Filter out NaN values in F1 and F2
        f1_filtered = f1_array[np.isfinite(f1_array)]
    f2_filtered = f2_array[np.isfinite(f2_array)]
    
    return f1_filtered, f2_filtered

Then I modified articulation.py: line 68: d = np.ones((nwind, ncol), dtype=np. int)

line 300 (aprox, extract_features_file function). I removed: temp_filename=PATH+'/../../tempfiles/tempFormants'+temp_uuid+'.txt' praat_functions.praat_formants(audio, temp_filename,self.sizeframe,self.step) [F1, F2]=praat_functions.decodeFormants(temp_filename) # Replace with Parselmouth (using extract_formants function) F1, F2 = praat_functions.praat_formants(audio) os.remove(temp_filename)

and add:

F1, F2 = praat_functions.praat_formants(audio)