[] DATASET - As a developer I want 5000 no complex music midi files so that I have a training dataset

localaization / pentagrom

We want to open source a machine learning model using a [(7 rows x constant) x 3 columns] + key signature matrix.

GNU General Public License v3.0

0 stars 0 forks source link

[] DATASET - As a developer I want 5000 no complex music midi files so that I have a training dataset #4

Open localaization opened 1 year ago

localaization commented 1 year ago

Description: We need to get training samples from non complex music. These samples will be a dataset of midi files.

Documentation

Definition of Done (DoD): We have a dataset with around 5000 midi files.

localaization commented 8 months ago

Lets take a look at this one, it has around 10,356 Monophonic songs http://kern.ccarh.org/help/data/ -> (Note: The site is not https secured)

TuWebO commented 8 months ago

Also thinking about csv files as datasource Check https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S2_MusicXML.html and https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S1_SheetMusic.html

TuWebO commented 8 months ago

Regarding the function def xml_to_list(xml): on https://www.audiolabs-erlangen.de/resources/MIR/FMP/C1/C1S2_MusicXML.html. At first sight I think that we will be able to improve it by processing all octaves at the same time for a certain "beat"?

TuWebO commented 8 months ago

Working on the dataset, I've come across this library music21 from the MIT, it might be useful for testing, also I think that we could improve it with our system. https://web.mit.edu/music21/doc/index.html

TuWebO commented 8 months ago

We could do some testing with the maestro dataset https://magenta.tensorflow.org/maestro-wave2midi2wave

It has audio midi files and the transcriptions.

Right now could be the best choice for a quick start.