Use a priori labels to inform clusters

kdmalc commented 1 month ago

Allegedly, the right vs left hand gestures have been shown to be differentiable, so first pass should be to go through the xlsx sheet and label each gesture as right, left, or neither (eg dont use) handed, then see if you can get gestures that correspond nicely to this right vs left hand dynamic

kdmalc commented 1 month ago

Once you know which gestures are right vs left handed, try passing them through various dim reduc and clustering algos. Perhaps you will even be able to visualize (eg in 2D or 3D) the different clusters. Since the layout of sensors is symmetric across the body, one handed gestures should light up half the sensors and not the other half, which should be directly clustered. May need to drop columns / modalities to ensure this is possible. May need to move to lower the number of dimensions (from dim reduc) to get around curse of dimensionality? Ideally we should be able to visualize the clusters and confirm based on our a priori labels, get this working first

kdmalc commented 1 month ago

I was half way working on this but for some reason Handedness is full of NANs... There are a decent amount of NANs to being with (allegedly half the gestures are listed as NAN eg should be empty in the xlsx file), and then the code stretches and applies the color map (and other reshaping) to transform from the per-gesture to the (num_gestures, timesteps, num_features) shape. I think adding timesteps may stretch/add too many NANs, or it could be something with gesture_num, or it could be something else (I forget what I thought the third cause was). Currently shows that there are 32k handedness NANs and only like 5k entries for right (and like 200 for both, and 0 for left). So handedness is probably bad anyways...

kdmalc commented 1 month ago

Also: even without visualization (eg doing clustering but in higher dimensions), it would still be good to evaluate the created clusters using the xlsx information as a label set (eg using the xlsx information as the "true cluster" labels). Since there are many columns in the xlsx, it may be good to iterate test the obtained clustering against a few main columns (eg, hand used, lifted arms off armrest, dynamic gesture, muscle activated 1, midair). Presumably, the best/true clustering labels are actually some combination of these.

As a first step, write a script to find a clustering and then compare it against each of those columns as labels (presumably performance will be poor...). Note of course that you will need to encode the labels from strings to digits.

Then see if there's a way to combine multiple columns of the xlsx to act as a single set of labels. Something something positional encoding? Or embed the strings of the columns into a vector space? May or may not work well but would be interesting

kdmalc commented 1 month ago

All of the following are metrics that are built into sklearn

Metrics that compare different labels: adjusted_mutual_info_score, adjusted_rand_score, contingency_matrix,pair_confusion_matrix, completeness_score, fowlkes_mallows_score, homogeneity_completeness_v_measure

Metrics that compare input data to the labels: calinski_harabasz_score, davies_bouldin_score, silhouette_score

kdmalc / fl-gestures

Use a priori labels to inform clusters #10