Closed Chunyuan-Li closed 1 year ago
Sorry,I missed this. (In para 5.1) "The scoreswere split systemby-system (using four measures each for the classical scores) and tokenized into token sequences."
I am a bit confused. During training, the musicxml files are split into MIDI files with each MIDI file corresponding to 4 measures in the musicxml. So, there is a one-to-one correspondence between the measures in MIDI and musicxml. However, during the entire inference process of the MIDI file, we need to split the MIDI file to get the results and reassemble them. Is this done by calculating the measures based on the information in the MIDI, or is it done by randomly splitting and concatenating the results? If it's the former, errors in time signatures in the MIDI may lead to errors in measure splitting. If it's the latter, does it mean that the training and inference data are not aligned?
In MIDI and MusicXML files, it is common for some longer files to have token counts exceeding 5000+. In such cases, how should they be handled? Should they be split or should the maximum input length in the Transformer be increased?