Open palisn opened 5 months ago
This seems related and might be interesting: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
We determined the samples where an internal state change happens with the following piece of code:
standard_weight = 1
change_weight = 5 # example value
y_before = x_train[:,-17:]
changes = np.any(y_before != y_train, axis=1)
sample_weight = (changes*(change_weight - standard_weight) + standard_weight)
For different change_weight
values, we got the following results after training the model for 100 epochs (excluding early stopping). The model is using the RMSprop
optimizer and the BinaryCrossentropy
loss with a learning rate of 1-e4
:
change_weight |
Loss | Metric | Convergence |
---|---|---|---|
1 | 0.025 |
0.56 |
|
2 | 0.03 |
0.55 |
|
5 | 0.04 |
0.66 |
|
10 | 0.052 |
0.62 |
|
20 | 0.07 |
0.67 |
|
50 | 0.115 |
0.72 |
[^1]
The training for the last configuration ended early as early stopping intervened. We will probably run another test series without early stopping, as improving our metric might necessarily completely align with improving our cost function.
[^1]: We have no idea what caused the validation loss to be so significantly smaller than the training loss. This should normally never be the case.
We have no idea what caused the validation loss to be so significantly smaller than the training loss. This should normally never be the case.
This is really weird. I investigated a bit more, and I currently don't see any reason for this.
My first hypothesis was that the ratios of changes in the train and dev dataset are different in the statically split datasets, but the ratios are nearly identically.
But looking at graphs, I would still guess that the difference in loss is due to changing samples. One can see that the difference gets worse the more weight we put on changing samples.
The next thing we tried is undersampling. Here we use as many negative samples as we have positive. The result is, after training for 228 epochs, we have a loss of 0.05
and a custom metric evaluation of 0.76
that is for the training data.
For the test data unfortunately the result look a lot worse: We still have a similar loss but the metric capped at around 0.5
.
Hence, we will try oversampling next.
Before we try oversampling, I forgot to try some different ratios for undersampling:
Including twice as many data samples for the negative class as for the positive one, we get a loss of 0.047
and a metric evaluation of 0.617
.
For thrice as much, we get a loss of 0.044
and a metric evaluation of 0.522
.
With five times the data from the negative class than the positive class, we get a loss of 0.04
and a metric evaluation of 0.483
.
For ten times as much, we get a loss of 0.032
and a metric evaluation of 0.486
.
Oversampling the dataset, such that there are as many positive samples as negative ones, gives us a loss of 0.038
(val: 0.039
) and a metric evaluation of 0.8766
(val: 0.669
) after 145 epochs.
This is related to #7.
Because we retrieve data from the IMUs at regular rates and most of the time the fingers are still rather than typing, our datasets are unbalanced where the positive class – state changes (and holding, see #28) – is a lot smaller than the negative class – disengaged fingers.
This issue collects and evaluates measures to prevent bad classification due to the unbalanced nature of the dataset. Some approaches include: