sony / hFT-Transformer

Pytorch implementation of automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture (hFT-Transformer).
MIT License
78 stars 11 forks source link

Is there any information loss during the conversion process #1

Open Chunyuan-Li opened 1 year ago

Chunyuan-Li commented 1 year ago

Thanks for your sharing firstly. I have noticed that the original MIDI file and the MIDI file obtained after conversion using midi2note are not consistent. there is my note2midi convert:

midi = pretty_midi.PrettyMIDI(resolution=960) 
instrument = pretty_midi.instrument.Instrument(program=0, is_drum=False, name='piano')

for note_data in note_list:
    pitch = note_data['pitch']
    onset = note_data['onset']
    offset = note_data['offset']
    velocity = note_data['velocity']

    # 创建Note对象并添加到MIDI对象中
    note = pretty_midi.Note(
        velocity=velocity,
        pitch=pitch,
        start=onset,
        end=offset
    )
    instrument.notes.append(note)

# 将乐器添加到PrettyMIDI对象中
midi.instruments.append(instrument)

midi.write('tmp_note.mid') 

And the ori midi like this: image

Then convert to note and convert the note to midi: image

this case is '2018/MIDI-Unprocessed_Chamber2_MID--AUDIO_09_R3_2018_wav--1.midi' in MAESTRO-V3 dataset.

KeisukeToyama commented 1 year ago

Hello @Chunyuan-Li Thank you very much for the comment. I apologize for the delay in responding, as I have just realized this issue.

Our hFT-Transformer does not transcribe pedal information (control change #64), so we convert the MIDI information as played without a pedal. This conversion makes longer duration if the pedal is used. We implement this process in https://github.com/sony/hFT-Transformer/blob/master/corpus/conv_midi2note.py. If you listen to both MIDI files, you can notice that both are identical in the audio domain.

Chunyuan-Li commented 1 year ago

If represented as sheet music, on one hand, the presence of the pedal causes the durations of the notes we derive from audio to be longer than standard. On the other hand, the poor separation of piano hands in the score (mainly due to the fact that the maestro dataset consists of single-track MIDI while in Musescore it is dual-track) leads to an appearance that is consistent audibly but looks quite distorted. I'm wondering if you have researched these two issues.