Open xinzuan opened 1 year ago
When i check the value of the ref_intervals, est_intervals, it gave a really different value:
ref_interval: [[ 0.98046875 1.08723958]
[ 0.99739583 1.25260417]
[ 1.09375 1.16536458]
...
[384.79557292 388.55338542]
[384.79817708 388.61067708]
[384.80989583 388.52864583]]
est_interval: [[387.07030113 387.36055057]
[387.07030113 387.5475941 ]
[387.07030113 387.5475941 ]
...
[146.40265896 146.64646848]
[362.68555193 362.89453152]
[307.87372971 308.01433333]]
which I think this is one of the reason why the previous result is really far from what reported in the paper. After modify the functions in basic-pitch/basic_pitch/experiments/run_evaluation.py
:
modify the model_inference
function as follow:
def model_inference(audio_path, model, save_path,minimum_note_length=127.70):
output = run_inference(audio_path, model)
frames = output["note"]
onsets = output["onset"]
# frames (13678, 88) onsets(13678, 88)
min_note_len = int(np.round(minimum_note_length / 1000 * (AUDIO_SAMPLE_RATE / FFT_HOP))) # add min_note len since it is required
estimated_notes = note_creation.output_to_notes_polyphonic(
frames,
onsets,
onset_thresh=0.5,
frame_thresh=0.3,
infer_onsets=True,
min_note_len=min_note_len, # needed in the function, it will throw error if not provided
max_freq=None, # needed in the function, it will throw error if not provided
min_freq=None # needed in the function, it will throw error if not provided
)
# [(start_time_seconds, end_time_seconds, pitch_midi, amplitude)]
pitch = np.array([n[2] for n in estimated_notes])
pitch_hz = librosa.midi_to_hz(pitch)
estimated_notes_with_pitch_bend = note_creation.get_pitch_bends(output["contour"],estimated_notes)
times_s = note_creation.model_frames_to_time(output["contour"].shape[0])
estimated_notes_time_seconds = [
(times_s[note[0]], times_s[note[1]], note[2], note[3], note[4]) for note in estimated_notes_with_pitch_bend
]
midi = note_creation.note_events_to_midi(estimated_notes_time_seconds, save_path)
intervals = np.array([[times_s[note[0]], times_s[note[1]]] for note in estimated_notes_with_pitch_bend])
return intervals, pitch_hz,midi # add midi in the return to be used in the evaluation
3. In the function ``main``, instead of using the intervals and pitch_hz returned from the function ``model_inference``, I used:
_,,midi = model_inference(audio_path, model, save_path)
est_notes = io.load_notes_from_midi(midi = midi) if est_notes is None: est_intervals = [] est_pitches = [] else: est_intervals, estpitches, = est_notes.to_mir_eval()
I finally got the result that are close to the result reported in the paper:
{'Precision': 0.11997030494604051, 'Recall': 0.11606390831628464, **'F-measure': 0.11663329326696836**, 'Average_Overlap_Ratio': 0.8401297548289717, 'Precision_no_offset': 0.7436669014704781, 'Recall_no_offset': 0.6548245337432261, **'F-measure_no_offset': 0.6874150165838026**, 'Average_Overlap_Ratio_no_offset': 0.4262920646319229, 'Onset_Precision': 0.8259000078273144, 'Onset_Recall': 0.721544837754125, 'Onset_F-measure': 0.7601824436965499, 'Offset_Precision': 0.5818535280932536, 'Offset_Recall': 0.504137416529927, 'Offset_F-measure': 0.5329684074137423}
Hi @xinzuan. The training branch is still a work in progress, so don't rely on it too heavily. Regarding your issue, it's possible that there is a difference in units between the estimate, reference timestamps and frequency values and your solution took care of the difference.
Hi, I run the
basic-pitch/basic_pitch/experiments/run_evaluation.py
from branchwip-training
with MAESTRO dataset and model checkpoint frombasic-pitch/saved_models/icassp_2022
.I expect the result should be similar reported in the paper. However, I got following result: {"Precision": 0.0, "Recall": 0.0, "F-measure": 0.0, "Average_Overlap_Ratio": 0.0, "Precision_no_offset": 0.04398411727609082, "Recall_no_offset": 0.029748905165349712, "F-measure_no_offset": 0.03468172982454684, "Average_Overlap_Ratio_no_offset": 0.5793096961557063, "Onset_Precision": 0.631602431674569, "Onset_Recall": 0.4181107759888922, "Onset_F-measure": 0.4925505866527016, "Offset_Precision": 0.7521021756258168, "Offset_Recall": 0.5273589516900296, "Offset_F-measure": 0.6072445448462509}.
Based on my understanding on mir_eval definition of each metrics, the one corresponding to F should be the F-measure, Fno should be F-measure_no_offset. (I cannot find the mir_eval for Acc). However ,from the upper result, you can see the result is really far from what reported in the paper.
Could anyone please tell me which mir_eval metrics corresponding to each metric in the paper?