xjjak / LapCal

Building gloves that enable typing on a 34-key keyboard without an actual physical keyboard using IMUs and machine learning.
Other
2 stars 0 forks source link

[FEATURE] General state detection #7

Open palisn opened 10 months ago

palisn commented 10 months ago

Add a new model that predicts the current state of the imaginary keyboard. Optimally, this may introduce combined press and release detection and additional row detection in one go.

This depends on the data collection including multiple fingers and rows to work somewhat reliably. Thus, it depends on the issue #3.

palisn commented 9 months ago

Issue #3 has now been resolved. After some data is collected, this can be actively worked on.

palisn commented 7 months ago

Following sub-issues that are relevant to this issue were recently introduced:

palisn commented 7 months ago

For testing purposes, we established a moderately large neural network consisting of four hidden layers, each 5000 neurons with the ReLU activation function, and a 17-neurons large output layer with sigmoid activation function. Additionally, each layer has L2 regularization.

model = keras.Sequential(
    [
        layers.Input(shape=(179,)),
        layers.Dense(5000, activation = "relu",
                     kernel_regularizer=regularizers.l2(0.001)),
        layers.Dense(5000, activation = "relu",
                     kernel_regularizer=regularizers.l2(0.001)),
        layers.Dense(5000, activation = "relu",
                     kernel_regularizer=regularizers.l2(0.001)),
        layers.Dense(5000, activation = "relu",
                     kernel_regularizer=regularizers.l2(0.001)),
        layers.Dense(17, activation = "sigmoid")
    ]
)

We compiled the model with a BinaryCrossentropy loss – which is typically used for multi-label problems – and the RMSProp optimizer. Afterward, we trained the model with a batch size of 32 for 100 epochs (excluding early stopping).

The results show that the model does converge but yields terrible results for our positive class; thus, it does not perform well regarding our metric.

Learning Rate Loss Metric Convergence
1-e3 0.03 0.29 image
5-e4 0.03 0.29 image
1-e4 0.03 0.43 image
5-e5 0.025 0.44 image

(Warning: The x axes do not necessarily start at 0 to increase readability.)

This emphasizes the issue that the dataset is unbalanced. For better results, we should probably change the model architecture or introduce some counter-measure against the issue.

29

palisn commented 7 months ago

To address the largely different domains of the features, we introduced Normalization (also known as Standardization) into the model:

inputs = layers.Input(shape=(179,))
normalizer = layers.Normalization()
normalizer.adapt(x_train)
x = normalizer(inputs)
for _ in range(4):
    x = layers.Dense(5000, activation = "relu", kernel_regularizer=regularizers.l2(0.001))(x)
outputs = layers.Dense(17, activation = "sigmoid")(x)
model = keras.Model(inputs, outputs)

This change caused visible improvement. With a learning rate of 1-e4 and 100 epochs of training (the same setup as in the last message) we were able to achieve:

palisn commented 7 months ago

still a bit more potential for further improvement with further training

After 315 epochs there is no more improvement and the result is:

Quite the improvement and validation shows that we are not overfitting. But, it is still not enough to be usable in practice.

palisn commented 7 months ago

As an alternative to BinaryCrossentropy we tried BinaryFocalCrossentropy as it is designed to help learn unbalanced datasets. We were able to achieve a loss slightly below 0.01. Surprisingly, this did not reflect on our metric or on results for positive class in general.