Open localaization opened 1 year ago
Lets take a look at this one, it has around 10,356 Monophonic songs http://kern.ccarh.org/help/data/ -> (Note: The site is not https secured)
Also thinking about csv files as datasource Check https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S2_MusicXML.html and https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S1_SheetMusic.html
Regarding the function def xml_to_list(xml): on https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S2_MusicXML.html. At first sight I think that we will be able to improve it by processing all octaves at the same time for a certain "beat"?
Working on the dataset, I've come across this library music21 from the MIT, it might be useful for testing, also I think that we could improve it with our system. https://web.mit.edu/music21/doc/index.html
We could do some testing with the maestro dataset https://magenta.tensorflow.org/maestro-wave2midi2wave
It has audio midi files and the transcriptions.
Right now could be the best choice for a quick start.
Description: We need to get training samples from non complex music. These samples will be a dataset of midi files.
Documentation
Definition of Done (DoD): We have a dataset with around 5000 midi files.