neuro-team-femto / cleese

Combinatorial Expressive Speech Engine
MIT License
42 stars 10 forks source link

[audio] Merge BPF files when audio transformations are chained #7

Open jjau opened 2 years ago

jjau commented 2 years ago

When several audio transformations are chained (e.g. [pitch, gain]), cleese generates several folders of output files, corresponding to all successive nested transformations (e.g. /pitch and /pitch_gain). The BPF files associated with each of the output files in the outermost transformations (i.e. here /pitch_gain) only give parameters for the last-in-chain transformation (i.e. here, gain).

This behaviour is not optimal when the output files are used as stimuli in reverse correlation experiments, because we need to keep track of all the transformation parameters involved in generating the file (i.e. pitch and gain).

One fix would be to automatically merge the metadata file of each output file to include parameters for all transformations involved, as in the code below

# merge metadata files
output_folder = 'output/2021-10-26_00-58-55'
sound_files = glob.glob(output_folder+'/pitch_gain/*.wav')
rms_metadata_files = glob.glob(output_folder+'/pitch_gain/*.txt')

for rms_metadata_file in rms_metadata_files: 
    # rms data
    rms_data = pd.read_csv(rms_metadata_file, sep=' ', header=None, names=['t','rms'])
    # corresponding pitch file
    pitch_metadata_file = output_folder+'/pitch/'+os.path.basename(rms_metadata_file)
    pitch_metadata_file = os.path.splitext(pitch_metadata_file)[0][:-9]+'_BPF.txt'
    pitch_data = pd.read_csv(pitch_metadata_file, sep=' ', header=None, names=['t','pitch'])
    # merge
    data = rms_data.merge(pitch_data,on='t')
    # save
    data_file = output_folder+'/pitch_gain/'+os.path.splitext(os.path.basename(rms_metadata_file))[0][:-4]+'.txt'
    data.to_csv(data_file, index=False) 
    # remove original file
    os.remove(rms_metadata_file)
jjau commented 4 months ago

One difficulty here is that the time base for all transformations is not necessarily the same, e.g.

[pitch]
windows.n=5
...
[stretch]
windows.n=6

One default behaviour could be that values at finer-grained breakpoints are interpolated from their original breakpoints, depending on the ramp/square setting of the corresponding transform.

jjau commented 4 months ago

Filenaming conventions:

chain = [pitch, stretch]