AugmentedNet: Experiments of my own

napulen commented 3 years ago

I departed from Micchi et al. once I realized it was very difficult for me to trust on the experiments I ran on Micchi's codebase.

My lack of expertise on deep learning and the possible "silent" issues of Micchi's implementation make it very difficult to know, for example:

Why my results don't seem to be reproducible every time
Why is it so difficult for the network to learn the inversion feature? No matter what
Why is it that more information about pitch height makes little or no difference?
Is the different length of input sequences and output sequences detrimental to some features?

I decided to implement my own model from scratch.

napulen commented 3 years ago

Here is the first experiment. Training on all When-in-Rome scores, with music21 chord realizations (i.e., no voice leading).

The same architecture predicting either:

The bass as a two-hot (note name and pitch class) 19-feature vector
The inversion as a one-hot 4-feature vector

bass

Epoch 1/20
241/241 [==============================] - 8s 25ms/step - loss: 0.3887 - binary_accuracy: 0.8742
Epoch 2/20
241/241 [==============================] - 6s 26ms/step - loss: 0.3196 - binary_accuracy: 0.8976
Epoch 3/20
241/241 [==============================] - 6s 25ms/step - loss: 0.2999 - binary_accuracy: 0.8978
Epoch 4/20
241/241 [==============================] - 6s 24ms/step - loss: 0.2733 - binary_accuracy: 0.9006
Epoch 5/20
241/241 [==============================] - 6s 25ms/step - loss: 0.2546 - binary_accuracy: 0.9049
Epoch 6/20
241/241 [==============================] - 6s 26ms/step - loss: 0.2387 - binary_accuracy: 0.9076
Epoch 7/20
241/241 [==============================] - 6s 26ms/step - loss: 0.2320 - binary_accuracy: 0.9080
Epoch 8/20
241/241 [==============================] - 6s 25ms/step - loss: 0.2287 - binary_accuracy: 0.9079
Epoch 9/20
241/241 [==============================] - 6s 26ms/step - loss: 0.2259 - binary_accuracy: 0.9083
Epoch 10/20
241/241 [==============================] - 6s 26ms/step - loss: 0.2205 - binary_accuracy: 0.9104
Epoch 11/20
241/241 [==============================] - 6s 26ms/step - loss: 0.2118 - binary_accuracy: 0.9138
Epoch 12/20
241/241 [==============================] - 7s 27ms/step - loss: 0.2034 - binary_accuracy: 0.9169
Epoch 13/20
241/241 [==============================] - 6s 26ms/step - loss: 0.1928 - binary_accuracy: 0.9210
Epoch 14/20
241/241 [==============================] - 6s 25ms/step - loss: 0.1849 - binary_accuracy: 0.9238
Epoch 15/20
241/241 [==============================] - 6s 26ms/step - loss: 0.1781 - binary_accuracy: 0.9268
Epoch 16/20
241/241 [==============================] - 6s 27ms/step - loss: 0.1729 - binary_accuracy: 0.9287
Epoch 17/20
241/241 [==============================] - 6s 26ms/step - loss: 0.1674 - binary_accuracy: 0.9308
Epoch 18/20
241/241 [==============================] - 6s 26ms/step - loss: 0.1606 - binary_accuracy: 0.9337
Epoch 19/20
241/241 [==============================] - 6s 26ms/step - loss: 0.1565 - binary_accuracy: 0.9357
Epoch 20/20
241/241 [==============================] - 6s 25ms/step - loss: 0.1502 - binary_accuracy: 0.9379

inversion

Epoch 1/20
241/241 [==============================] - 7s 24ms/step - loss: 0.4625 - binary_accuracy: 0.7953
Epoch 2/20
241/241 [==============================] - 6s 24ms/step - loss: 0.3773 - binary_accuracy: 0.8360
Epoch 3/20
241/241 [==============================] - 6s 24ms/step - loss: 0.3465 - binary_accuracy: 0.8492
Epoch 4/20
241/241 [==============================] - 6s 24ms/step - loss: 0.3154 - binary_accuracy: 0.8599
Epoch 5/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2964 - binary_accuracy: 0.8671
Epoch 6/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2808 - binary_accuracy: 0.8743
Epoch 7/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2696 - binary_accuracy: 0.8790
Epoch 8/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2562 - binary_accuracy: 0.8846
Epoch 9/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2460 - binary_accuracy: 0.8888
Epoch 10/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2423 - binary_accuracy: 0.8898
Epoch 11/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2334 - binary_accuracy: 0.8944
Epoch 12/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2348 - binary_accuracy: 0.8939
Epoch 13/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2244 - binary_accuracy: 0.8991
Epoch 14/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2263 - binary_accuracy: 0.8979
Epoch 15/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2246 - binary_accuracy: 0.8981
Epoch 16/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2279 - binary_accuracy: 0.8982
Epoch 17/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2123 - binary_accuracy: 0.9047
Epoch 18/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2085 - binary_accuracy: 0.9070
Epoch 19/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2205 - binary_accuracy: 0.9029
Epoch 20/20
241/241 [==============================] - 6s 23ms/step - loss: 0.2019 - binary_accuracy: 0.9108

Caveat: The prediction is done using binary_classication through a sigmoid activation of each output-layer neuron. In the past, I remember the accuracy of this method was unreliable because the network learns to throw "a lot of 0s everywhere" and the few instances of 1 that are very important in practice, are not penalized accordingly by the accuracy metric.

The easiest way of figuring out whether the accuracy is good enough, is to predict on some random examples and observe the predictions. I did that last time and it was very insightful.

napulen commented 3 years ago

Trying out event-based input/output representation, rather than fixed-timestep

napulen / phd_thesis

AugmentedNet: Experiments of my own #3

bass

inversion