The current method of preprocessing the music data is to split it into equal-time chunks, convert into an array of dB at each frequency at each time step (~1024 x 44 array, where the x axis is frequencies), and then run these 2D arrays through a convolutional neural network.
The timbre of a musical note is a function of the amplitude at the lowest frequency of the note as well as its harmonic frequencies. The convolutional features are not yet accounting for timbre.
Try reshaping the input data into a 3D array such that each frequency is adjacent in the z axis to its nearest harmonic frequencies (is this possible?). Then try using a 3D convolutional layer and see if it alters performance..
The current method of preprocessing the music data is to split it into equal-time chunks, convert into an array of dB at each frequency at each time step (~1024 x 44 array, where the x axis is frequencies), and then run these 2D arrays through a convolutional neural network.
The timbre of a musical note is a function of the amplitude at the lowest frequency of the note as well as its harmonic frequencies. The convolutional features are not yet accounting for timbre.
Try reshaping the input data into a 3D array such that each frequency is adjacent in the z axis to its nearest harmonic frequencies (is this possible?). Then try using a 3D convolutional layer and see if it alters performance..