State Detection: Adjustments for unbalanced dataset

palisn commented 5 months ago

This is related to #7.

Because we retrieve data from the IMUs at regular rates and most of the time the fingers are still rather than typing, our datasets are unbalanced where the positive class – state changes (and holding, see #28) – is a lot smaller than the negative class – disengaged fingers.

This issue collects and evaluates measures to prevent bad classification due to the unbalanced nature of the dataset. Some approaches include:

Adjusted cost function and metric (#28)
Resampling:
- Undersampling
- Oversampling
Weighted samples

palisn commented 4 months ago

This seems related and might be interesting: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data

palisn commented 4 months ago

We determined the samples where an internal state change happens with the following piece of code:

standard_weight = 1
change_weight = 5 # example value

y_before = x_train[:,-17:]
changes = np.any(y_before != y_train, axis=1)

sample_weight = (changes*(change_weight - standard_weight) + standard_weight)

For different change_weight values, we got the following results after training the model for 100 epochs (excluding early stopping). The model is using the RMSprop optimizer and the BinaryCrossentropy loss with a learning rate of 1-e4:

`change_weight`	Loss	Metric
1	`0.025`	`0.56`
2	`0.03`	`0.55`
5	`0.04`	`0.66`
10	`0.052`	`0.62`
20	`0.07`	`0.67`
50	`0.115`	`0.72`

[^1]

The training for the last configuration ended early as early stopping intervened. We will probably run another test series without early stopping, as improving our metric might necessarily completely align with improving our cost function.

[^1]: We have no idea what caused the validation loss to be so significantly smaller than the training loss. This should normally never be the case.

palisn commented 4 months ago

We have no idea what caused the validation loss to be so significantly smaller than the training loss. This should normally never be the case.

This is really weird. I investigated a bit more, and I currently don't see any reason for this.

My first hypothesis was that the ratios of changes in the train and dev dataset are different in the statically split datasets, but the ratios are nearly identically.

But looking at graphs, I would still guess that the difference in loss is due to changing samples. One can see that the difference gets worse the more weight we put on changing samples.

palisn commented 4 months ago

The next thing we tried is undersampling. Here we use as many negative samples as we have positive. The result is, after training for 228 epochs, we have a loss of 0.05 and a custom metric evaluation of 0.76 that is for the training data. For the test data unfortunately the result look a lot worse: We still have a similar loss but the metric capped at around 0.5.

Hence, we will try oversampling next.

palisn commented 4 months ago

Before we try oversampling, I forgot to try some different ratios for undersampling:

Including twice as many data samples for the negative class as for the positive one, we get a loss of 0.047 and a metric evaluation of 0.617.

For thrice as much, we get a loss of 0.044 and a metric evaluation of 0.522.

With five times the data from the negative class than the positive class, we get a loss of 0.04 and a metric evaluation of 0.483.

For ten times as much, we get a loss of 0.032 and a metric evaluation of 0.486.

palisn commented 4 months ago

Oversampling the dataset, such that there are as many positive samples as negative ones, gives us a loss of 0.038 (val: 0.039) and a metric evaluation of 0.8766 (val: 0.669) after 145 epochs.

xjjak / LapCal

State Detection: Adjustments for unbalanced dataset #29