Open Chunyuan-Li opened 1 year ago
Sorry for my super late response. Currently, the tools are not considering elements related to BPM or tempo, because these elements do not directly involved in the MIDI to Score conversion.
If these elements are necessary for your use case, I suggest considering custom extensions to the tokenizer and detokenizer:
score_to_tokens.py
): You can add the necessary elements to attribute_to_token()
, while modifying attributes_to_tokens()
as well. tokens_to_score.py
): You can add the conversion to the music21 elements to single_token_to_obj()
. Yes, I did try that as well. However, I found that the model struggles to predict the bpm accurately, regardless of whether I explicitly specify bpm in the midi tokens (from midi tempo changes). Eventually, I removed the bpm indicators.
Additionally, I encountered another issue on the model side: when training with non-standard midis (typically converted from audio) and standard musicxml as data pairs, the model's prediction performance deteriorated significantly, often leading to missing notes. Do you have any suggestions?
If your goal is to transcribe BPM from MIDI to Score, I think it's not necessary to include it in the token conversion process. Instead, you could simply append the BPM read from the MIDI as a tempo object to the transcribed score.
Your issue with "non-standard midis" seems similar to the case of unquantized (noisy) input described in Section 6.5 of my paper. The key might be in augmentation, regarding note timing and duration.
Indeed, it bears a resemblance to the addition of noise discussed in Section 6.5 of the paper. I conducted a comparison using standard MIDI files, altering parameters such as noise ratio and range (duration). I observed that as noise increased, the model gradually started forgetting notes in musicxml. Additionally, in non-standard MIDI files, there were instances of note errors, making it seemingly more challenging for the model to learn. In light of this situation, do you have any suggestions for effective solutions?
I noticed that in the direction element of MusicXML, there is a per-minute attribute within the direction-type element used to describe beats per minute (BPM). However, this attribute is missing in both the tokenizer and detokenizer processes. As a result, after detokenization, all the BPM values in the resulting MXL file are set to 120, which is clearly problematic.