leo-so / VocalMelodyExtPatchCNN

Vocal melody extraction using patch-based CNN
GNU General Public License v3.0
30 stars 8 forks source link

post processing needed? #3

Open shlomotannor opened 3 months ago

shlomotannor commented 3 months ago

I was able to run the model and get a numpy array that seemed to range from 0 to 255. I tried storing it as a midi file the frequency range was totally off - I tried normalizing by subtracting 100 or dividing by 4 - this brought the result closer to my expectation it seemed to have the shape of the input melody.

FYI I used mp3 input with sf 44100

here is the melody_extraction code I used

def melody_extraction(infile, outfile):

melody_extraction(‘path1/input.wav’, ‘path2/output.txt’)

patch_size = 25
th = 0.5
modelname = 'model3_patch25'
max_method = 'posterior'
print('Feature extraction of ' + infile)
Z, t, CenFreq, tfrL0, tfrLF, tfrLQ = feature_extraction(infile)
if max_method == 'raw':
    result = contour_pred_from_raw(Z, t, CenFreq)
    postgram = Z
else:
    print('Patch extraction from %d frames' % (Z.shape[1]))
    data, mapping, half_ps, N, Z = patch_extraction(Z, patch_size, th)
    print('Predictions from %d patches' % (data.shape[0]))
    pred = patch_prediction(modelname, data, patch_size)
    result = contour_prediction(mapping, pred, N, half_ps, Z, t,\
                                CenFreq, max_method)
    postgram = show_prediction(mapping, pred, N, half_ps, Z, t)

# Convert the result to a MIDI file
midi = pretty_midi.PrettyMIDI()
instrument = pretty_midi.Instrument(program=0)

for i in range(result.shape[0]):
    start_time = result[i, 0]
    pitch = int(np.clip(result[i, 1], 0, 127))  # Clip pitch values to the valid range
    if pitch > 0:  # Ignore zero pitch values
        note = pretty_midi.Note(
            velocity=100, pitch=pitch, start=start_time, end=start_time + 0.1)
        instrument.notes.append(note)

midi.instruments.append(instrument)
midi.write(outfile)
leo-so commented 3 months ago

The pitch values (i.e., your result[i, 1]) are in the unit of Hz. To convert them to MIDI numbers n, a conversion like n=12*np.log2(result[i, 1]/440)+69 is needed.